Monday, September 30, 2013

In my OAuth implementation, I talked about token provisioning for user and client by the OAuth server. I also talked about varying the token over time for calls by the same client on behalf of the same user. I would now like to describe the OAuth bearer token and its provisioning in my proposed OAuth implementation.
Bearer tokens are not the 'default' tokens. These are special tokens issued so that any source can access any protected resource. It is merely used to encapsulate user authorization. As long as they are transmitted securely and are tamper proof and with some other considerations discussed earlier, they can supposedly be used by any client to request user's protected resources. This is merely a step to make it easy for user authorizations to be passed around for more reachability. In all the usages of the bearer token, the client must authenticate the server before sending the token over. The tokens are supposed to be handled and passed around with care since presumably anybody can use it. I consider the bearer tokens as a simple relaxation of client validation and using the user authorization only to access protected resources.
Therefore in my proposed OAuth provider implementation, I add the check to bypass client validation when the token is a bearer token.
This implies that we need to add additional information to our initial userId and ClientId information that we encrypt to call it a bearer token. The way we store this information in the token is entirely up to us. We could choose to set a bit or a digit to indicate a bearer token, or add a column to our token table among others such that we can lookup the token to see if its a bearer token. Based on my previous post, I want to add the bearer token information to the original string before I encrypt it. This way I have that information handy when I decrypt it.
Note that I do not consider other token attributes such as redirect uri, scope, state etc in the original information because the only thing I'm validating is the client. Should the client validation be bypassed, I would like that information in the decrypted string.
Next let us consider how to specify the bearer token in the original string before encryption. One way to use this could be to use a special delimiter on the highest place order in the clientId.
Another way would be to specify even number for the rotation of the userId and clientId
Yet another way could be to add a bearer token as a separate byte between user Id and client Id.
Yet another way could be to zero out the client Id to say that anyone can use it or special case the client Id.
Thus there are many ways in which the bearer token can be specified in the original string.
No matter how we choose to capture the bearer information, we have to add a check to the client validation to be bypassed when its a bearer token.
In this post, I want to talk about Application Object in Microsoft Excel, that lets you work with Excel sheets programmatically.
You could write code such as follows:
Application.Windows("book1.xls").Activate
or
Set x1 = CreateObject("Excel.sheet")
x1.Application.Workbooks.Open "newbook.xls"

You also have a PivotTable and PivotChart object in the Excel Object Model.
They facilitate pivot transformation of data. You could use the PivotTableWizard to generate the PivotTable.
ActiveSheet.PivotTableWizard(xlDatabase, Range("A1:C100")





Sunday, September 29, 2013

In the previous post, we discussed generating different tokens for the same user and client over time. We could do this based on a variant we add to the userId and clientId before encryption.
This was made possible with a hhmmss integer that we append to the UserId and ClientId.
This had the benefit that we used fixed length string to encyrpt and decrypt. However all of the original string may be known to the user. So if we want to keep some part of the original string unknown, we could add a magic number.
All of this could be avoided if we used an offset to rotate the userId+clientId string based on say hhmmss. The logic to translate the hhmmss to an offset thats within the bounds of 0 to the length of the original fixed length string is entirely up to the server.
We also mentioned that with the OAuth server, we preferred to keep a token table that gets populated for all OAuth token grants. At the same time, we said that the APIs that validate the tokens need not rely on the token table and can directly decrypt the tokens to identify the userId and the clientId. The table is only for auditing purposes at this point.
The token validation does not need to happen within the API implementation although APIs could choose to do so. That validation could be done with ActionFilterAttributes we discussed earlier. or even via HTTP handlers. The URI and query string could still be passed to the API implementation.
The id conversion may need to occur if the APIs would like to get the userId and clientId from the token itself so that the API resources do not require a user or client in the resource qualifiers. This is because the id is integer based. If the earlier implementation was based on GUIDs, the id and GUID for the same user or client may need to be looked up.
APIs are not expected to do away with user or client id since the token is a security artifact and not a functional artifact. To the user it could be rendundant or additional task to provide user and client information alongwith a token. To the API implementation, irrespective of the caller, the userId and clientId could be parameters so that callers can look up the information when the parameter values change.
That said, most of the resources are based on user profiles and has nothing to do with client profiles. If the users are already being aliased so that they don't have to enter their userId, then the resource qualifiers for the API implementations can certainly choose to not require userId and clientId. This will benefit the users who call the API.
I mentioned earlier that on mobile devices where text entry and input is generally difficult at this time, it is better to require less when the users have to specify the API directly.
Lastly, the database schema may need to be modified if ID parameter is not already what is currently being proposed. 

Saturday, September 28, 2013

In today's post I want to continue the discussion on generating tokens by encryption. We mentioned encrypting the UserId and ClientId and circulate the base64 encoded string as tokens. We rely on the strength of the encryption to pass the information around. However we missed mentioning a magic number to add to the UserId and ClientId before we encrypt. This is important for several reasons. One because we want to be able to vary the tokens for the same UserId and ClientId. And we want to make it hard for a hacker to guess how we vary it. One way to do this for example would be to use the current time such as in hhmmss format along with an undisclosed constant increment.
Another way to generate the tokens without adding another number to the input is rotate the userId and ClientId. So the string constituting the UserId and the ClientId will be split and the two substrings exchanged in their positions. After decryption, we can swap it again to get the original string. Since the client Ids are not expected to grow as large as the integer max, we can use the leftmost padding of the client Ids as the delimiter.
Another way to do this would be to use the Luhn's algorithm that is used for generating credit card numbers. Here every alternate number in the original sequence is doubled and then their sum is multiplied by 9 and taken with modulus 10. This gives the check number to add to the end.
No matter how we vary the tokens from being generated, we can create a base64 encoded token.
The OAuth spec does not restrict the tokens to be a hash. The tokens could be anything. If they store information about the user and the client for validations during API calls, that is not restricted. The OAuth goes to mention such possibilities in the spec.
When considering the tradeoffs between a hash and an encrypted content for a token, the main caveat is whether the tokens are interpretable. If a malicious user can decrypt the tokens, then its a severe security vulnerability. Tokens can then be faked and the OAuth server will compromise the protected resources. Since the token is the primary means to gain access to the protected resources of the user, there's no telling when the token will be faked and misused and by whom. Tokens therefore need to be as tamper proof as the user's password.
If the tokens were not encrypted but merely a hash that has no significance in terms of contents, and the relevance is private to the server, then we are relying on  a simpler security model. This means we don't have to keep upgrading the token generating technologies as more and more ways are discovered to break them. This however adds a persistence to the OAuth server such that these hashes can be tied back to the user and the client.
Even though I have been advocating a token database, and a hash is a convenient simpler security model, I firmly believe that we need both a token table and tokens that have user and client information in place with them in a way that only the server can generate them.
Simpler security model does not buy us the improvement in terms of scale and performance in the API implementations where token validation is desirable. All API calls should have token validation. Some could have user validation but all should have client validation.

Friday, September 27, 2013

In today's post I want to cover encryption and decryption of data we have talked about in the previous posts. For example, we wanted to generate an OAuth token based on information about the user and the client. So we wanted to do something like
Encrypt(UserID  + ClientID) = Token 
where UserID is a large integer and and Client ID is a regular integer. The original text can therefore be 16 and 8 characters in length which gives us 24 characters. We used fixed length for both UserID and ClientID and pad left. If we want to keep the size of the encrypted text to be the same as the original string, we could choose AES stream encryption. If we were to use stronger algorithms the size would likely bloat. And when you hex or base64 encode, the text could double in size.
In the database, there is an easy way to encrypt and decrypt the string using the ENCRYPTBYKEY (key_GUID)

CREATE CERTIFICATE Certificate01
   ENCRYPTION BY PASSWORD = 'adasdefjasdsad7fafd98s0'
   WITH SUBJECT = 'OAuth tokens for user and client',
   EXPIRY_DATE = '20201212';
GO

OPEN SYMMETRIC KEY Token_Key_01
   DECRYPTION BY CERTIFICATE Certificate01;

UPDATE Token
SET Token = EncryptByKey(Key_GUID('Token_Key_01'),
             Convert(Varchar(16), UserID) + Convert(Varchar(8), ClientID));
GO

I want to mention the choice of the encryption algorithms. This could be AES, DES, 3DES, SHA1 etc. The encryption could be both block based and stream based.
For our purposes, we want to keep the size of the tokens to be a reasonable length. Since access tokens are likely passed around in the URI as query parameter, this should not be very long.
Moreover, the decryption should also work for quick and reasonable check on the tokens.
This way the storage of the tokens can be separated from the validation of user and client against a token. The storage of the token comes useful for auditing etc. The data is always pushed by the token granting endpoint/ There is no pull required from the database if the API implementations can merely decrypt the token.
In today's post I want to cover encryption and decryption of data we have talked about in the previous posts. For example, we wanted to generate a token based on information about the user and the client. So we wanted to do something like Encrypt(UserID + ClientID) = Hash where UserID is a large integer and and Client ID is a regular integer. The original text can therefore be 16 and 8 hexadecimal characters in length which gives us 24 characters. If we want to keep the size of the encrypted text to be the same as the original string, we could choose AES stream encryption. If we were to use stronger algorithms the size would likely bloat.
OAuth token database usage considerations
Here is a list of items to consider when provisioning a token table for all OAuth logins:
1) During the login, once the token is created, we have all the parameters that were used to create the token. They have been validated and hence the token was created. So the data entry should be reasonably fast. There should be no additional validations to occur.
2) Since this is an insert into a table that stores only the last hour of tokens,the table is not expected to grow arbitrarily large, so the performance for the insert should not suffer.
3) In a majority of the tokens issued, the user credentials will be requested. So expect that the user Id will be available. The table should populate the user Ids.
4) When each API call is made that validates the token against the user and the client, the lookup should be fast. Since these are based on Hashes or APIKeys, these should be indexed.
5) During the API call we are only looking at a single token in the table so other callers should not be affected, since the same client is expected to make the calls for that user. If another instance of the client is making the call to the same user, a different token is expected. So the performance will not suffer. And there should be no chance for performance degradation between the API calls.



Thursday, September 26, 2013

In the previous post I mentioned an archival policy for a token table. Today during implementation I used the following logic
While exists records to be moved from active to archive
BEGIN
SELECT the id of the candidates a few at a time
INSERT the candidates that have not already been inserted
DELETE from active table these records
END
There are a couple of points that should be called out. For one, there is no user transaction scope involved. This is intentional. I don't want bulk inserts that can take an enormous time as compared to these. These are also subject to failures and are not effective in moving records when the bulk insert fails.
Similarly the user transaction to cover all cases is almost unnecessary when we can structure the operations such that there are checks in each step and preceding steps. The latter helps with identifying failures and taking corrective actions Moreover the user transaction only helps to tie the operations together while each operation is itself transacted. The user transaction is typically used with larger data movement. With the approach to take only a few records even say a handful at a time and checking that they won't be actively used, that they have not already been copied to the destination from  a previous aborted run and deleting the records from the source so that they won't come up again, helps with removing the onus of a user transaction and taking locks on a range of records. Besides, keeping the number of records to a handful during the move, we don't have to join the source and destination tables and instead join only with the handful of records we are interested in. This tremendously improves query performance.
But how do we really guarantee that we are indeed moving the records without failures and that this is done in a rolling manner until all records have been moved.
We do this by finding the records correctly.
How ? We identify the set of records with the keyword to denote only the top few that we are interested in. Our search criteria is based on a predicate that does not change for these records so that these records will match the predicate again and again. Then we keep track of their IDs in a table variable that consists only of one column comprising of these IDs. So long as these records are not being actively used and our logic is the only one doing the move, we are effectively owning these records. Then we take these records by their IDs and compare with source and destination. Since we  check for all the undesirable cases such as original still left behind in the source, duplicates inserts into the destination, we know that we are being successful in the move. Lastly once we are done with our operation, we will not find these records again to move over thus guaranteeing that our process works. We just have to repeat the process over and over again until all the records matching the criteria have been moved.
This process is robust, clean, tolerant to failures. and resumable from any time of the run.

Wednesday, September 25, 2013

In a previous post I showed a sample of the source index tool. Today we can talk about a webUI to a source index server. I started with an ASP.NET MVC4 WebAPI project and added model and views for a search from and results display. The index built before-hand from a local enlistment source seems to work just fine.
There is a caveat though. The Tokenizer or in this case as it is called the Analyzer in Lucene.Net object model, is not able to tokenize some symbols. But this is customizable and so long as we strip the preceding or succeeding delimiters such as '[', '{', '(', '=', ':", we should be able to use the syntax.
The size of the index is around a few MB for around 32000 files of source with the basic tokenizer. So this is not expected to grow beyond a few GB of storage if all the symbols were tokenized.
Since the indexing happens offline and can be rebuilt periodically or when the source has sufficiently changed, the index rebuilding does not affect the users for the source index server.
As an aside, for those following the previous posts on OAuth token database, maintenance of the source index server is similar to maintenance of the token database. An archival policy helps the token database to be in certain size because all expired tokens are archived and the token expiry time is usually one hour. Similarly, the input index for the source index server can periodically rebuilt and swapped with the existing index.
With regards to the analyzer, a StandardAnalyzer could be used instead of a simple Analyzer. It takes a parameter for different versions and more recent version could be used to better tokenize source code.
Declarative versus code based security
Configuration files are very helpful to declare all the settings that affect the application running. These changes are made without affecting the code. Since security is configurable, security settings are often declared in the config file. In WCF, there are options to set the elements for different aspects of communication contracts in the configuration file as well as to set them via properties on objects in the code. For example, service binding has numerous parameters that are set in the configuration file.
In production, these settings can be overriden and provides a least intrusive mechanism to resolve issues. This is critical in production because code or data changes require tremendous validation to prevent regression and that is time-consuming and costly.
Configuration files are also scoped to appdomain, machine and enterprise wide. So the changes can be applied to the appropriate scope. Usually application configuration files alone have a lot of settings.
Changes need not be applied directly to the configuration file. It can be applied externally via usersettings file which will override existing configuration files. This again is very desirable in production.
However, the settings can explode to a large number. When the configuration settings are too many, they are occasionally stored in a database. This adds to an external repository for these settings.
Usually the settings are flat and are easy to be iterated while some involve their own sections.
Settings provided this way are easy to be validated as well.
Besides, CM tools are available that can make the configuration changes across a number of servers.

Tuesday, September 24, 2013

Passbook

Passbook is an application on the iOS devices which lets you keep track of your tickets, rewards, coupons all in one place. It lets you keep track of the images and the bar codes so you can display it on your phone and avail requests. This works well for items we like to keep in our digital wallet and use later. Since most of the items are digital today, this wallet is a convenient app to keep in your wallet. What is interesting is that the items are stored on the server and are accessed by a URI. And even more interesting is that the items can be anything as long as they meet the criteria for inclusion in the passbook.
If we consider the various cards we collect and dangle to our keychain, such as rewards cards and gift cards, then these are a substantial collection and therefore convenience to be stored digitally. E-Tickets are another example. All of this is in the passbook. This is a huge convenience to not have loose scraps of paper or card with you anymore. And have everything stored and tracked digitally.
If we could customize these where the user is able to drag and drop different items seen on different screens into the passbook, we add yet another convenience. Images or screenshots can be taken from all the different websites or applications using the phone features and then if they could be dragged and dropped into the passbook, the user now has done away with almost all the copying and pasting that ever needed to be done. Further, the application is the best possible way to keep track images outside the device since its now stored and reclaimed via web requests and responses.
There are already services that integrate with the Passbook already. For example, square provides registration serivices to push cards into the passbook. Most retailers can now directly work with Square integration.
Square provides services that let you charge cards directly from the phone. This is a device you attach to the phone and it lets you swipe credit card on the device. An application on the device reads the card information to process the payment. This application is called the register. This is a native application on the handheld device.
Square also provides a REST API to add cards to Passbook. These APIs lets you create, update, delete and retrieve Passbook pass. With the CRUD functionality available via APIs, there is complete elimination of maintaining any passbook objects on the caller side. The requests and responses are made with a service credential so there is a registration involved. And the same can be used for any type of cards. So if an application could be written to flow passes to your Passbook using integration service such as this, it could enable any user desired card to be cut out and made to flow into the passbook. 
CSS stylesheet elements review
Class is a selector. You could use class from a stylesheet directly in the controls within the HTML body. All elements with that class can then be defined in the stylesheet as .classname  { color : green } Only one class can be specified per selector
 ID is also a selector and can be used for unique elements such as a particular paragraph element.
The style sheets can be more than one and can be combined so that the individual sections are modular. Hence the term cascading style sheets. Both the author and the reader can influence the presentation through the stylesheets using the same language.
Conflicts between different stylesheets are resolved as follows:
1) Find all declarations that apply to the element or property. If there is no declaration, an inherited value is used.
2) Sort the declarations by explicit weight such as with the marking important.
Weights of individual declarations can be improved with the notation important.
3) Sort by origin : the author's style sheets override the reader's style sheet
4) Sort by specificity of selector - more specific selectors will override more general ones. For example, id attributes is applied over class.
5) Sort by order specified, use the latter specified in the order.

 

Monday, September 23, 2013

In the previous posts, we looked at possible solutions for keyword detection. Most of those relied on large corpus for statistical modeling. I want to look into solutions on the client side or at least rely on web based requests to a central server. When we relied on the keyword detection using corpus data, there was a large volume of text parsed and statistics gathered. These are useful when we are doing server side computing but in the case of client tools such as a word doc plugin we hardly have that luxury unless we rely on web requests and responses. If the solution works on the server side, the text on the client side can be sent to the server side for processing. If the server side is implemented with APIs, then it can enable several clients to connect and process.
This means we can write a website that works with different handheld devices for a variety of text.  Since the text can be parsed from several types of document, the proposed services can work with any.
The APIs can be simple in that they take input text and generate a list of keywords in the text in the form of word offsets, leaving the pagination and rendering to the client. This works well when a text or a collection of words can be looked up based on relative offsets which guarantee a consistent and uniform way to access any incoming text.
Also, some of the things we could do with server side processing is to build our corpus if that is permissible. The incoming text is something that we could get more representative text for which this strategy is important. The representative text is not only important from a collection perspective but also gives common keywords that are most prevalent. The identification of this subset alone can help with pure client side checks and processing.
Along the lines of the previous post, some common lambda expressions:
Func<bool> tester = () => true;
Func<string, string> selector = str => str.ToUpper();
Func<string, int, string[]> extract = (s,i) => i > 0 ? s.Split(delimiters,i) :  s.Split(delimiters);
Func<string, NumberStyles, IFormatProvider, int> parser = int.Parse;
button1.Click += async (sender, e) =>  { await Task.Delay(1000); textbox1.Text = "Voila!" }
var firstLargerThanIndexNumbers = numbers.TakeWhile( (n,index) => n > index);
var query = people.join( pets,
                                    people => people,
                                    pets => pets.Owner,
                                    (person, pet) => new { Pet = pet.Name, "OwnerName"=person.Name });
grades.OrderByDescending(grade => grade).Take(3);
Enumerable.Repeat("Repetition", 15);
var squares = Enumerable.Range(1, 10).Select(x => x * x);

Sunday, September 22, 2013

Sentence reversal in msdn

var sentence = "the quick brown fox jumped over the lazy dog";
var words = sentence.split(' ');

var reversedSentence  = words.Aggregate((sent, next) =>  next + " " + sent);

Console.WriteLine(reversedSentence);
Here's a quick reference on WCF Contract Attributes, jargons etc.
1) MSMQ - message delivery guarantees such as when receiver is offline, transactional delivery of messages, durable storage of messages, exception management through dead letter and poison letter queues dead letter queues used when messages expire or purged from other queues. Poison letter queues used when max retry count is exceeded. Security over AD. MSMQ management console facilitates administration.
NetMsmqBindingFeatures :
ExactlyOnce, TimeToLive (default 1day), QueueTransferProtocol ( SRMP for HTTP expose ), ReceiveRetryCount, MaxRetryCycles, UseMsmqTracing, UseActiveDirectory, UseSourceJournal,
ClientTransactions - TransactionScope, Queued Service Contract - TrasnactionFlow(TransactionFlowOption.Allowed)
Security Features:
Authentication - mutual sender and receiver, Authorization - access level, Integrity , Confidentiality,
SecurityMode - None, Transport, Message, Both -  TransportWithMessageCredential and TransportCredentialOnly, Client credentials are passed with the transport layer.
ClientCredentialType can be WindowsClient  property, HttpDigest property,  UserName property, ClientCertificate property, ServiceCertificate property, IssuedToken property, Peer property,
SecurityPrincipal - roles, identity. ServiceSecurityContext - claims, identity.
Claims based security model - security tokens and claims. using ClaimType, Right and Resource.
A X.509 token has a claim set where a list of claims selected from an indexer set are issued by a particular issuer. Authorization calls based on custom claims
Windows CardSpace is used for creating, managing and sharing digital identities in a secure and reliable manner. CardSpace usually has  a local STS that can issue SAML tokens. When a card is used to authenticate, a local or remote STS looks at the claims in the card to generate a token.
In a federated security, AAA is sometimes delegated to STS. The client application authenticates to the STS to request a token for a particular user. The STS returns a signed and encrypted token that can be presented to the relying party.
Exception handling - SOAP faults have faultCode, faultString, faultFactor, and detail.
Exception, SystemException, CommunicationException, FaultException, FaultException<T>.
BindingFeatures - Transport protocols, Message enconding, Message version, transport security, message security, duplex, reliable messaging and Transactions.
Reliability  - implemented via RM Buffer on both client and server side and session maintenance. RequireOrderedDelivery, retry attempts, SessionThrottling. Reliable sessions can be configured with Acknowledgement Interval, Flow Control, Inactivity Timeout, Max pending Channels, Max retry count, Max Transfer Size Window, Ordered etc.
Here are some more along with the previous post:

Decorator Pattern :  Here we don't change the interface like we did in the adapter. We wrap the object to add new functionalities. Extension methods is an example as well as a specific example with Stream class. BufferedStream, FileStream can all be wrapped from existing stream.

Iterator pattern :  This is evident from the IEnumerable pattern in .Net. This is very helpful with LINQ expressions where you can take the collection as Enumerable and invoke the standard query operators. The GetEnumerator() method and the MoveNext() on the IEnumerable enable the iteration.

Observer pattern : This is used to notify changes from one class to the other. Here we use an interface so that any object can subscribe to notifications that are invoked by calling the Notify method on the Observer.
public interface Observer {
void Notify(State s);
}
public class Subject
{
  private List<IObservers> observers;
  public void Add(IObserver);
  public void Remove(IObserver);
  public void NotifyObservers(State s)
  {
     observers.ForEach (x => x.Notify(s));
  }
}

Strategy pattern : When we are able to switch different ways to do the same task such as say sorting where we take a parameter for different comparisions, we implement the strategy pattern. Here the IComparer interface enables different comparisions between elements so that the same sorting operation on the same input can yield different results based on the different algorithms used. This pattern is called the Strategy pattern.
public class ArrayList : IList, ICollection, IEnumerable, ICloneable
{
    public virtual void Sort(IComparer comparer);
}

Saturday, September 21, 2013

I want to post some samples of select design patterns:
1) Builder pattern - This pattern separates the construction of a complex object from its representation so that the same construction process can create different representations.
public class RTFReader
{
   private TextConverter builder;
   public RTFReader( TextConverter t )
  {
    builder = t;
   }
   public void ParseRTF()
   {
              var t = token.Next();
              while (t != null)
             {
                switch( typeof(t))
                {
                    CHAR:
                      builder->ConvertCharacter(t.Char);
                    FONT:
                      builder->ConvertFont(t.Font);
                    PARA:
                     builder->ConvertParagraph(t.Para);
                 }
              }
    }
}

public abstract class TextConverter
{
   public ConvertCharacter(CHAR);
   public ConvertFont(FONT);
   public ConvertParagraph(PARA);
}

BTW - StringBuilder in .Net is not a design pattern.

Factory Pattern: Here we use an interface for creating an object but let the sub-classes decide the class to instantiate. In .Net library, we use for example a WebRequest class that is used to make a request and receive a response.

public static class WebRequest {

public static WebRequest Create(string);
}
The Create method creates various instances of the WebRequest class such as HttpWebRequest, FileWebRequest, FTPWebRequest etc.

Adaptor  Pattern: These can be often confused with Decorator patterns but they are different.
The Decorator patterns extend functionality dynamically. The Adaptors make one interface work with another so they change interfaces unlike the decorator.

SqlClient is an adapter pattern. Each provider is an adapter for its specific database. A class adapter uses multiple inheritance to adapt interfaces.

public sealed class SqlDataAdapter : DbDataAdapter, IDbDataAdapter, IDataAdapter, ICloneable
{
}

The key here is to inherit one and implement the other

Friday, September 20, 2013

In the previous post, we mentioned how we can index and store text with Lucene so that we can build a source index server. I also mentioned a caveat that unlike the Java version which may have a method to add files recursively from a directory, the Lucene.Net library does not come with it. So you build an index this way:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Lucene.Net;
using Lucene.Net.Analysis;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Store;

namespace SourceSearch
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Count() != 1)
            {
                Console.WriteLine("Usage: SourceSearch <term>");
                return;
            }

            var indexAt = SimpleFSDirectory.Open(new DirectoryInfo(Environment.SpecialFolder.LocalApplicationData.ToString()));
            using (var indexer = new IndexWriter(
                indexAt,
                new SimpleAnalyzer(),
                IndexWriter.MaxFieldLength.UNLIMITED))
            {

                var src = new DirectoryInfo(@"C:\code\text");
                var source = new SimpleFSDirectory(src);

                src.EnumerateFiles("*.cs", SearchOption.AllDirectories).ToList()
                    .ForEach(x =>
                        {
                            using (var reader = File.OpenText(x.FullName))
                            {
                                var doc = new Document();
                                doc.Add(new Field("contents", reader));
                                doc.Add(new Field("title", x.FullName, Field.Store.YES, Field.Index.ANALYZED));
                                indexer.AddDocument(doc);
                            }
                        });

                indexer.Optimize();
                Console.WriteLine("Total number of files indexed : " + indexer.MaxDoc());
            }

            using (var reader = IndexReader.Open(indexAt, true))
            {
                var pos = reader.TermPositions(new Term("contents", args.First().ToLower()));
                while (pos.Next())
                {
                    Console.WriteLine("Match in document " + reader.Document(pos.Doc).GetValues("title").FirstOrDefault());
                }
            }
        }
    }
}

Caching application block is another application block specifically designed for this purpose. Developers often choose Libraries such as this along with  Lucene.Net or external solutions as AppFabric. While AppFabric is a caching framework serving across all API based on URI hashing and request and response caching, developers may look for caching within their library implementation. Lucene.net is more an inverted index that helps with search and storage.
By the way, Lucene can be used to build a source index server for all your source code.
Here we build an index and run the queries against it. Lucene.Net has an IndexWriter class that is used to make this index. You need an Analyzer object to instantiate this IndexWriter.  Analyzers are based on language and you can also choose different tokenizers. For now, a SimpleAnalyzer could do. The input to the index is a set of documents. Documents are comprised of fields and fields are key value pairs. You can instantiate a SimpleFileIndexer and index the files in a directory. A Directory is a flat list of files. Directories are locked with a LockFactory such as a SingleInstanceLockFactory. Indices are stored in a single file.
I want to mention that Lucene comes with a Store to handle all the persistence. This store also lets us create  RAM-based indices.  A RAMDirectory is a memory resident Directory implementation.
Other than that it supports all operations as a regular directory in terms of listing the files, checking for the existence of a file, displaying the size in bytes, opening a file and reading bytes from a file or writing bytes to the file as input and output streams.
Lucene.Net's implementation of a Cache supports algorithms such as LRU or a HashMap to store key-value pairs. If you want a synchronizedCache you will have to explicitly mention so. The caches supports simple operations such as Get and Put.
Searching and storage are facilitated by Lucene.Net but you can also use such things as a SpellChecker  and Regex. QueryStrings can be defined as a series of clauses. A Clause may be prefixed by a plus or minus sign indicating whether it is to be included or excluded, or a term following a colon indicating the field to be searched such as with multiple terms.  A clause may either be a term indicating all the documents this term or a nested query, enclosed in paranthesis. If a range query is used, the QueryParser tries to detect ranges such as date values for a query involving date fields. The QueryParser is not threadsafe. However, you could also customize the QueryParser by deriving from the default and this can be made modular. The only caveat is that the .Net library based indexer does not have the enumerateFilesOrDirectories as the Java library does.

Thursday, September 19, 2013

Database unit-tests
In the former posts I have mentioned about using stored procedures with Enterprise library data access application block. Here I want to take a moment to describe some tests at the database unit-level. Since we mentioned that the application relies on the datareader so we are free to make changes at the database level, we should have tests for this interface. These tests are called Database unit-tests or DUTs for short. In our case, with the stored procedures, we simply write execute stored procedures with different parameters in these tests. They cover both the input as well as output parameters and check the results. The parameters, their default values and nulls, if any, are important to the execution of the stored procedure and the application should be able to rely on the checks within the stored procedure for proper user response. This means that the stored procedure should not make any assumptions about the call.
Let's take some examples here.
If I have a procedure that validates whether a token matches a client id, then I can pass in the client Id directly to the predicate in the where clause and this will handle null values correctly.
Similarly each CRUD operation could be expected to fail with the lines above it
and if there are two or more stored procedures that are expected to work one with the other, then they should be tested together.
Thus the granularity of the tests depend on the usage and purpose of these database objects.
Its upto the database objects to make the interface strict or inviting by using default values or nulls for the parameters but this has nothing to do with the fact that we want tests at this level to prevent regressions during application development. The goal of the enterprise library has been to make it easy to build applications by providing building blocks wherever applicable. We could choose from the application blocks as appropriate.
Another thing I wanted to bring up is the NUnit tests for the application development. These tests at the application level capture and determine the user expectations. This is very helpful to keep a seamless and consistent view to the user as the application evolves. When we make changes to the database and the application accommodates the changes, these tests help to determine regressions at the user level.
Finally, I want to mention that the package should be portable. Most libraries and application code can now be packaged and repackages as NuGet libraries. NuGet is a package manger that works well with the tools we discuss. 
Enterprise Library Data Access Application Block discussion continued.
DAAB supports Transactions. The get and save method for the tables are LoadDataSet and Update DataSet. LoadDataSet executes a command and returns the results into an existing dataset. UpdateDataSet accepts a dataset and saves all the modified data records into the database. ExecuteDataSet executes a command and returns the results to newly created dataset. ExecuteNonQuery can be used to invoke a stored procedure or a SQL Statement. Most of these are already guarded against SQL injection attack. Most of these accept a transaction that we can create with
IDbTransaction transaction = connection.BeginTransaction();
and  a try catch block can be used to rollback or commit the transaction.
You can see that the transaction parameter is a reuse from ADO.Net. It is best not to keep the transact ion open for a long time. You could also change one row at a time using the UpdateDataSet method or do bulk updates to the entire table. You could execute a query or a stored procedure. In many cases, the latter will help encapsulate the logic at the database level. Since the DAAB itself includes the unit-tests these are also available for review or for use with your own. The unit-tests help with the test driven development and come in very helpful to ensure the quality of the development work as it proceeds.
I want to bring up a couple of important points from the posts made so far on this topic:
1) Application blocks are developed by Microsoft patterns and practices and they are very helpful in developing applications because they provide building blocks.
2) the block patterns allows for encapsulation and isolation of commonly used development libraries that and very often involved practices that have worked in more than one deployments.
3) The seven different application blocks are:
caching application block
configuration application block
cryptography application block
data access application block
exception handling application block
logging and instrumentation application block
security application block
4) The security, caching and logging application blocks have an optional dependency on the data access application block.
5) The blocks can be chosen and included in the application with minimal changes to the application
6) Since they are already developed and tested, each application block saves you time for the purpose that they are built
7) In the case of the DAAB, the connection management, abstracting ADO.Net and database objects, their dependencies and avoiding the use of object graph in the code enables you to focus on the application and the results from the data access that you are interested in.


Wednesday, September 18, 2013

In addition to the earlier post on the benefits of enterprise library data access application block (DAAB), I want to bring up some more in this post. I mentioned the

removal of the dependency on the object graph and its persistence mechanism with DAAB but another advantage that may not be so obvious is that the database changes are

now decoupled from the application. And this is a very much desirable thing for rapid application development because database changes are very likely with new scenarios

or refinements as the application is being developed. Choice of DAAB over EDM in such cases helps alleviate the refreshes to the code  and model which were previously

causing code and test churns. In this case as long as the datareaders don't see the changes to the database, the application can continue to evolve separately. Many a times

the database changes affect more than one table or schema. For example, a column for an identifier to an entity in an external system may  need to be maintained. Or a new

column may  need to be added to indicate the source of the entity. Or a computed or translated column may need to be added for lookups. Or the data may exist side by side

in two different forms for use by different consumers. Indexes may need to be added or schema redesign may be involved. To the stored prodedures invoked by the

application, these changes need not be visible. In fact, DAAB allows for the interface between the application and data provider to be established first.
DAAB doesn't take away any of the benefits of the schema and objects in the database and they can continue to be accessed via the Database object. If anything, the

database object can be created simply with a factory class.

Further, parameterized queries can be constructed for adhoc commands. This is done by passing the text to the GetSqlStringCommandWrapper and invoking the AddInParameter The ExecuteDataSet method executes the DBCommandWrappers command, retrieves the results and creates a new DataSet and fills it.There are many overloads to the Execute DataSet method which allows for a variety of calls.

XML data can also be retrieved with an XMLReader object  and the ExecuteXmlReader method. The stored procedure uses the For XML AUTO clause to return the data in XML Format. 
This post talks about enterprise library data access block. I was able to go over the documentation on the msdn. Applications that have been using custom connection state and pooling as well as connections usually run into scalability and resource problems. The enterprise library opens and closes the connection as needed leaving you to focus on the dataReader and the data. You can edit the configuration with a visual tool. A database object is created using the DatabaseFactory.CreateDatabase method. The configuration tool stores the settings in the DAAB ( Data Access Application Block). The databases instances node is used to associate one of the instances type such as SQL or Oracle with one of the connection strings.  The connection string is stored in the configuration file and provides mechanisms to configure the security for the connection.  These can be encrypted so that they are not in clear text. As mentioned the database node is created first and this is done with a factory pattern. In ADO.Net, you open a connection, then fill a dataset or retrieve data through a data reader usually typed to the provider. In this application block, all of these are abstracted.  You just instantiate the database object and execute the reader using a command wrapper. This Database class has dozens of methods  most notably the ones to execute a stored procedure or SQL statement, return a dataset, a datareader, a scalar value, an xmlreader or nothing, allow for specific parameters to be created and passed in, determine which parameters that it needs, creates them and caches them, and involves commands in transaction.
The database object has some methods like the GetStoredProcCommand, GetParameterValue, AddInParameter, AddOutParameter that give detailed control over the database object
to be executed.  ExecuteScalar and ExecuteReader are some of the methods for execution. Since the results from the execution can be read from a dataReader, we can
directly populate objects with the results without having to create an object graph. This reduces a lot of complexity that comes with the object graph refreshes. Direct
manipulation of data is possible with methods like LoadDataSet and UpdateDataSet where you can specify the CRUD operation command and a transaction if necessary. You
could directly get the data adapter that implements the CRUD operation on the data source.

Tuesday, September 17, 2013

Today I tried out the enterpriselibrary.data framework and it was a breeze. To those familiar with entity framework this provides more streamlined access to data. For example, you could wire up the stored procedures results to the collection of models you define so that you can work with them instead of the entire object graph. There is a lot of debate around the performance of entity framework and perhaps in earlier blogs, I may have alluded to the different levers we have there to improve that. However, the enterpriselibrary comes with the block pattern that these libraries have become popular for. Blocks are reusable patterns across applications so that your development time is cut down and it comes with the reliability and performance that is come to be known with these libraries.
I want to bring up the fact that we associate the database by using a convenient DatabaseFactory.CreateDatabase method to work with the existing databases in the sql server. Some of the data access extensions may need to be written to translate the datareader columns to the objects and this helps because you can translate the results of the stored procedure execution directly into a collection of the objects you have already defined as models without the onus of the object graph.
In addition, there is no configuration sections involved in the config files and the assemblies can be installed and added to the solution using the visual studio nuget package manager.

OAuth bearer tokens practice

OAuth bearer tokens may currently be passed in the URL but the RFC seems to clearly call out that this should not be done. Therefore, checks and other mechanisms to safeguard these tokens should be in place. As an example, this parameter could be passed into the request body. or the authorization server may handle client validations. More is based on implementation.
In general, if the server and the clients communicate via TLS and they have verified the certificate chain, then there is little chance of token falling in wrong hands. The URL logging or https proxy are still vulnerabilities but the man in the middle attack is less of an issue if the client and the server exchange session id and keep track of each other's session id. As an API implementation, session Id's are largely site or application based and not the APIs concern but its good to validate based on session id if such is available.
Sessions are unique to the application. Even the client uses refresh tokens or re-authorizations to keep the session alive. At the API level, if the sessions were kept track of, it would not be tied to the OAuth revokes and re-authorizations, hence relying on session id alone is not preferable. At the same time, using session id as an additional parameter to confirm along with each authorization helps tighten security. It is safe to assume the same session prevalance until the next authorization or an explicit revoke.  By tying the checks exclusively to the token, we keep this streamlined to the protocol.
OAuth can be improved upon but it certainly enables redirections that make it easier for the user. In addition, the use of expiry dated tokens enable clients to reduce the chat with the authorization server.
In addition, many applications can now redirect to each other for same user authorizations. So the user has to sign in far lesser than before. If the user is signed in to a few sites, he can use the existing signed in status to gain access to other sites. This is not just a mere convenience to the user, it enables same user to float between sites and also enables applications to integrate and share user profile information for a richer user experience. 

Monday, September 16, 2013

Tests for the client validation changes include the following
1) specify one client based token grant and access by another client
2) specify token grant to one client and revoke by same client and reuse of a revoked token by the same client
3) specify token grant to one client and revoke by a different client
4) specify token grant to one client, revoke by a different client, and reuse by the original client
5) specify low privileged token grant to one client, specify high privileged token grant to same client, use of both tokens by the same client
6) specify low privileged token grant to one client, access low privileged token by another client
7) specify user privileged token grant to one client, specify token grant by same user to another client, clients exchange token
8) specify user privileged token grant to one client, specify token grant by different user to same client, client swaps token
9) specify user privileged token grant to one client, specify client to request several tokens until a large number.
10) specify user privileged token grant to multiple clients until a large number of clients reached
11) specify user privileged token grant and revoke to same client a large number of times

Delegated tokens or bearer tokens
The RFC makes special provisions for bearer tokens. Bearer tokens can originate from any source to access any resource protected by these tokens. Therefore they should be stored and transmitted with care.
For example, these tokens can be sent in the following ways:

1) When the access token is sent in the authorization header in the http request, a predefined syntax is used which takes the form "Bearer 1*SP b64token" where b64token is base64.
As an aside a base64 string consists only one occurrence of any given Alpha or digit and / or one occurrence of -, ., _, ~, +, /, = special characters.
2) The bearer token could be sent in the request body with the "access_token" using "application/x-www-form-urlencoded"
3) The URI query parameter could also include the "access_token=" however it should be sent over TLS, along with specifying "Cache-Control" header with the private option.
Since URIs are logged, this method is vulnerable and is discouraged by the RFC. It documents current usage but goes so much as saying it "SHOULD NOT" be used and it goes against a reserved keyword.

If the request is authenticated, it could be responded with error messages such as invalid_request, invalid_token, and insufficient_scope as opposed to not divulging any error information to unauthenticated requests.
Threats can be mitigated if the
1) tokens are tamperproof
2) tokens are scoped
3) tokens are sent over TLS
4) TLS Certificate chains are validated
5) tokens expire in reasonable time
6) token exchange should not be vulnerable to eavesdropper
7) Client verifies the identity of the resource server ( this is known as securing the ends of the channel)
8) tokens are not stored in cookies or passed as page URLs
In this post, we talk about client registrations. OAuth mentions that clients be given a set of credentials that they can use to authenticate with the server.  This is much like the user name password except that the client password is Base64 encoded and is called client secret. The client id and secret are issued at the time of registration. Therefore the authentications server which also has the WebUI could host this site and thereby reduce the dependency on the proxy. Besides, this will integrate the developers with the regular users of the site.
Every information that the client provides is important. Also, the access token that is issued has some parameters we talked about earlier such as scope, state etc. However, one field I would like to bring up in this post is the Uri field. This is supposed to be the redirection uri and state from the client. This is seldom used but is a great way to enforce additional security.
In the list of things to move from the proxy to the provider, the token mapping table, the validation for each api to ensure the caller is known and the token is the one issued to the caller, the checks for a valid user in each of the authorization endpoints where user authorization is requested. etc are some of the items.
WebUI redirection tests are important and for this a sample test site can be written that redirects to the OAuth WebUI for all users and handles the responses back from the WebUI. A test site will enable the redirects to be visible in the browser.
The test site must test the webUI for all kinds of user responses to the OAuth UI in addition to the testing of the requests and responses from the WebUI.
WebUI testing involves a test where the user sees more than one client that have been authorized. Updates to this list is part of webUI testing, therefore the registration and removal of apps from this list have to be tested. This could be done by using different clientId and clientSecret based authorization requests to the server. The list of clients will come up in html so the html may have to be parsed to check for the names associated with the different clientIds registered.
Lastly, webUI error message handling is equally important. If the appropriate error messages are not provided, user may not be able to take the rectifiable steps. Moreover, the WebUI properties are important to the user in that they provide additional information or self help. None of the links should be broken or mis-spelled on the webUI. The WebUI should provide as much information about its authenticity as possible. This way it will provide additional deterrence against forgery.

Sunday, September 15, 2013

The post involves discussion on  APIs to remove all user/me resource qualifiers from the API config routes. If the OAuth implentation doesn't restrict a client from using the notion of a superuser who can access other user profiles based on /user/id that would mean the protocol is flexible.
Meanwhile, this post also talks about adding custom validation via ActionFilterAttributes
For performance, should we be skipping token validation on all input parameters.
This is important because it lowers security in favor of performance and the tradeoff may have implications not just to the customer.
That said even for the critical code path, security has to be applied to the both the endpoints administration as well as token granting endpoints.
The token granting mechanisms also need to make sure the following are correct.
1) the tokens are not rotated or reused again.
2) the tokens hash is generated using the current timestamp.
3) the tokens hash should not be based on userId and clientId.
Should the tokens be encrypted, then they could use userId, clientId so that they can be decrypted.
The third post will talk about client registrations separately since they are currently tied to the proxy and is not in immediate scope.


In this post, we will describe the implementation in a bit more detail.
First we will describe the database schema for the OAuth.
Then we will describe the logic in the controllers that will validate and filter out the bad requests and those explicitly prohibited.
Then we will describe the tests for the webUI Integration. All along the  implementation we will use the features available from the existing proxy and stage the changes needed to remove the proxy.
First among these is the schema for the token table. This table requires a mapping for the userId and the clientId together with the issued token. This table is entirely for our compliance with OAuth security caveats and hence user, clients and proxy are unaware of this table and will not need to make any changes on their side due to any business rules associated in this table as long as they are OAuth compliant. Since the token is issued by the proxy, we will need to keep token request and response information in this table. In addition, we will record the apiKey and clientId from the client along with the token even though the proxy may be enforcing this already. (Note clientId is internal and apiKey is public and they are different.) This as described in the previous post helps us know who the token was originally intended for and whether, if any, misuse occurs by a third client. And we will keep the user mapping as optional or require a dummy user since some clients may request credentials only to access non-privileged resources. It could be interesting to note that the userId is entirely owned by our API and retail company but the check for an access token issued on behalf of one user to be used with only that user's resources is currently enforced by the proxy.  That means the proxy is keeping track of the issued token's with the user context or is passing the user context back to the API with each incoming token.
But we have raised two questions already. First - Should we treat user context as nullable or should we default to a dummy user. Second - Does the proxy pass back the user context in all cases or should the API implement another way to lookup user given a token for non-proxy callers ?
Let us consider the first. The requirement to have a non-nullable field and have a default value for client credential only calls, is certainly improving validation and governance. Moreover, we can then have a foreign key established with the user table so that we can lookup user profile information directly off the token after token validation. This leads the way to removing the ugly "/user/me" resource qualifier from all the apis that access user privileged resources. The userId is anyways internal usage only so the APIs look cleaner and we can internally catalog and map the APIs to the type of access they require. This means having another table with all the API routes listed and having classified access such as user privileged or general public.  This table is not just an API-Security table but also provides a convenient placeholder for generating documentation and checking correctness of listings elsewhere. Such additional dedicated resources for security could be considered an overhead but we will try to keep it minimal here. Without this table we will assume that each API internally applies the UserId retriever ActionFilterAttribute to those that require to access privileged resources and the retriever will apply and enforce the necessary security.
We will also answer the second question this way. The proxy provides user information via "X-Mashery-Oauth-User-Context" in the request header. This lets us know that the token has been translated to a user context and the proxy has looked it up in a token database.  This token database does not serve our API Security otherwise we would not be discussing our schema in the first place. So lets first implement the schema and then we will discuss steps 2 and 3.

Saturday, September 14, 2013

Improvements to OAuth 2.0 if I could venture to guess:
1) Tokens are not hashes but an encyrption provided in request and response body. Token is an end result representing an agreement between user, client, and one or more parties. A given token could be like a title on an asset immediately indicating the interests vested in the asset - be it the bank, an individual or third party.  However the token must be interpretable to the authorization server alone and nobody else. It should be opaque to both the user and the client and understood only by the issuer to repudiate the caller and the approver. The client and the user can use their state and session properties respectively to repudiate the server. In both cases, this has nothing to do with the token.  Such a token processing obviates the persistence of tokens in a store for lookup by each API call. This moves the performance overhead from storage and cpu to mostly cpu based which should be welcome on server machines. That said, almost every api makes a call to a data provider or database and yet another call to associate a hash with a user and a client is not only simpler but archivable and available for audit later. Test will find this option easy, more maintainable and common for both dev and test. This option also scales up and out just like any other api call and since its a forward only rolling population, there is possibility to keep the size finite and even recycle the tokens. The internal representation of user and client has nothing to do with the information exchanged in the querystring so the mapping is almost guaranteed to be safe and secure. This approach also has conventional merits of good bookkeeping with the database technologies such as the ability to do change data capture, archivals, prepared plan executions etc. In the encryption based token scenario, the entire request and response capture may need to be taken for audit and then parsed to isolate the tokens and discover their associations. Each discovery may need to be repeated over and over again in different workflows. Besides an encrypted string is not easy to cut and paste as is the simplicity of a hash and the cleartext on the http. That said, encryption of request parameters or for API call signatures are anyways used in practice so the adoption of an encryption based token should not have a high barrier to adoption. Besides, the tokens are cheap to issue and revoke.
2) Better consolidation of the grant methods offered. This has been somewhat alluded to in my previous post and it will simplify the endpoints and the mechanisms to what is easy to propagate. For example, the authentication code grants and the implicit code grants need not be from an endpoint different from the token granting endpoints since they are semantically the same. Code to token and refresh could be considered different but at the end internal and external identifiers will anyways be maintained be it for user, client or token. Hence, the ability to treat all clients as one and access tokens with or without user privileges and all requests and response as pipelined activities will make this more streamlined. Finally, an OAuth provider does not need to be distributed between the proxy and the retailer. In some sense, these can all be consolidated in the same stack.
The TFS Query object is used to run queries stored on the TFS Server. When this class is instantitated, the constructor requires two or more parameters named WorkItemStore and WIQL text. The WIQL text can be navigated to using the QueryFolder and QueryDefinition for 'Shared queries' folder and the name of the query respectively and using the QueryText property on the QueryDefinition. However if the WIQL text has a reference to variables such as '@Project' it throws the TFS Error. Ok. That had to do with wrapping constants in quotes and TFS permissions and this is now automated.

Friday, September 13, 2013

Assuming that security provided by an OAuth implementation for REST APIs is a layer above the actual APIs for various services, there should be no logic within the API implementation that

checks the user or client or their mapping to a token. That should be taken care of such that all API implementations except for the OAuth ones will work for all callers. In such a case,

the APIs should uniformly be governed by this security declaration. One way to implement that would be to declare an ActionFilterAttribute such that all APIs can be decorated with it. This

provides a uniform security declaration.
The Implementation for this ActionFilterAttribute can for example check the following:
1) validate the api key that it belongs to a known set of registered clients
2) validate the access token by pulling up the corresponding userId, clientId and mapping
These implementations are at the controller level but can be expanded to private methods and extensions
The attribute itself may be packaged in a separate assembly say the OAuth2.Provider.web.dll and made available via Nuget.
The checks for UserID may already be available via API implementations that rely on aliases for userID.
The checks for ClientID and Token mapping require talking to OAuthProviders either local or remote and hence they need additional configuration sections to retrieve these values from.
The last check can be applicable across all APIs since the apiKey and accessTokens are available to each.
The mapping for the tokens could be stored centrally in the same database as the user profiles from where userId is validated.

This post continues from the previous post in that we discuss the validations for client and the client authorizations. Specifically, we did not mention the state parameter and its use to thwart cross site forgery attacks. such attack is mounted when a victim user agent is made to follow a malicious URI to a trusting server. An attacker usually injects its own access token to make the victim client post sensitive user information to the attacker's protected resources instead of the user's resources. The, OAuth server associates an access token to a user. Given an access token, it will correctly resolve the user associated with the token but it may not detect that an access token has been replaced because the same client could have been authorized by more than one user. With this injected token,  the attacker is able to hijack the session and gain access to sensitive information.
CSRF protection becomes necessary. The redirection URI is typically protected by requiring any request to include  a value that binds the request to the user agents authenticated state such as a hash of the session identifier in the state parameter.
This additional information enables a client to verify the validity of the redirects. This is simple for the clients to enforce and hard for the attacker to circumvent.
The authorization server must also enforce such CSRF protection for its authorization endpoint and enforce that a malicious client cannot gain an access token without user intervention.

Thursday, September 12, 2013


In OAuth implementation, There are four kinds of TokenProviders. These are ResourceOwnerTokenProvider, ClientCredentialTokenProvider, AuthorizationCodeTokenProvider and RefreshTokenProvider. It's the TokenProvider that implements the checks for valid user Id and valid client Id. Therefore, the TokenProvider should also implement the check to see that the token is mapped to the same user Id and ClientId as expected. The access    token need not be stripped by the proxy. If the proxy strips the access token and forwards it on to the API implementation, then it should implement the check to see that the token maps to the user and the client correctly. Unless the access token is received by the API implementation, it is hard for the API to validate the same. The only mitigation in such a case is to keep a registry of all TokenProvider calls such that the incoming clientIds are recorded and queried against the proxy. If a client registry fails the proxy lookup, then it should be denied any tokens. For clients that are registered but not user authorized, we look it up with another call to the proxy asking for the applications allowed by the client. If two clients are registered and authorized by the same user, the registry for clientId and accessToken maintained by the API will mitigate the hijacking of token between these two clients. This step by step check is only necessary when we are relying on a proxy that doesn't automatically validate the clients before issuing the tokens to them. These steps rely on the assumption that the proxy provides methods for lookup a client information based on the client Id and to list all the clients authorized by a given user.
When the API has its own implementation of an OAuthProvider, instead of relying on the proxy, then the token database maintained by the API will have an association between the token, the user and the client. That enables a single central checkto validate a given access token instead of authenticating a client.
OAuth RFC mentions there are more than one way to authenticate a client and the common practice being the use of a client credential such as a client id and a client secret. The other kinds of client authentication can be password or public/private key pair. The only thing the RFC mentions as a must is that the clients should not be authenticated by their apikey or clientid alone since it is public and can be switched. Therefore a lookup of the client secret against a client registry is required. This client secret is sent to the authorization server in the request body as opposed to the apikey that's passed as query string. Further, the RFC distinguishes the clients as confidential clients and public clients saying the confidential clients should have a client authentication method.


In OAuth, There are four kinds of TokenProviders. These are ResourceOwnerTokenProvider, ClientCredentialTokenProvider, AuthorizationCodeTokenProvider and RefreshTokenProvider. It's the TokenProvider that implements the checks for valid user Id and valid client Id. Therefore, the TokenProvider should also implement the check to see that the token is mapped to the same user Id and ClientId as expected. The access token need not be stripped by the proxy. If the proxy strips the access token and forwards it on to the API implementation, then it should implement the check to see that the token maps to the user and the client correctly. Unless the access token is received by the API implementation, it is hard for the API to validate the same. The only mitigation in such a case is to keep a registry of all TokenProvider calls such that the incoming clientIds are recorded and queried against the proxy. If a client registry fails the proxy lookup, then it should be denied any tokens. For clients that are registered but not user authorized, we look it up with another call to the proxy asking for the applications allowed by the client. If two clients are registered and authorized by the same user, the registry for clientId and accessToken maintained by the API will mitigate the hijacking of token between these two clients. This step by step check is only necessary when we are relying on a proxy that doesn't automatically validate the clients before issuing the tokens to them. These steps rely on the assumption that the proxy provides methods for lookup a client information based on the client Id and to list all the clients authorized by a given user.
OAuth testing discussion continued
we had discussed a few test cases with OAuth earlier. The server has to validate a response. 

Wednesday, September 11, 2013

QueryFolders and QueryDefinitions are how the TFS arranges the QueryHierarchy in its object model. We were looking for ways to find items in this hierarchy. What we could do is we could cast the QueryHierarchy as another QueryFolder. Then access the QueryItem as by GUID or name with the accessor Item["Guid"] or Item["String"].  It is not clear from the documentation whether these are recursive but the paramter seems to indicate so.
In any case, you can use the FindNodeInSubTree method mentioned earlier with the Project object. Since the QueryHierarchy is part of the project, this search can return the desired result as well.
Another approach to do this search for a specific query definition would be to organize the query definition in a specific tree structure of the folders that is described by say a SubTreeHierarchy. When you cast any node in the the project level sub-tree into this SubTreeHierarchy object and it doesn't turn out to be a null, then you have an object that can tell you the path to the sought after query definition in constant time.
Another approach is to do breadth first search or a depth first search of the entire subtree. Since the structure of a tree at least for queries could be finite and the algorithms to search for the breadth first or depth first are known and since the starting point could be anywhere in the QueryHierarchy tree, this works well too.
In addition, lookup by name or Id could be faster, if we used the hash that the Item accessor provides. If possible, then the string could be used instead of the GUID.
Another approach to do this search could be to work with the structure alone instead of lookup by attributes such as name or id. For example, the third node of the folders under the second node of the folders under the root has the query definition we are interested in after we have looked it up in the Team Foundation Server using the Visual Studio Work Items selection in the team explorer.
Regardless of the method we use, we are relying on the object model to lookup the query, execute it and give us the desired results.
The object model differs from the API model in that we don't hide complexity behind the objects and their methods. In the API model for example, we take well known methods and arrange the resources. True the API could appear as methods of a singleton object or a composition but that's not what it is designed for. The APIs allow for any number of variations and mix in the API calls to solve different tasks that the APIs have been written for. By keeping the methods the same and promoting it over http, it only adds ease of use, diagnosability and testing by visual means. The API trace not only identifies sender, receiver, request, response, their payload and parameters but also the timestamp, the order and the duration for each. This is a sufficient plan to know what was invoked when and allows for usage outside the program.
This is where OData comes into picture. What we could achieve by client model to gain access to the data is now available via the web since they are exposed directly as resources for individual or collective listing.

OAuth testing continued

One more test I needed to add to my list earlier is that the tokens expiry time could be validated by waiting for the expiration time and trying it again. In addition, we could test that the refresh tokens are issued for non-expired tokens. Token issued to one client should not be usable by another client.
The spoofing client could even use the same API key as the spoofed client. If the same user authorizes two clients both of whom have now requested access tokens, then these tokens should be similar, work the same and generally not be transferable or exchanged. A client requesting user authorization cannot use the same token for non-user privileged API for another user.
In the previous post, there was a mention for the different services hosted by the team foundation server. We will explore them in detail now. These services are :
1. ITeamFoundationRegistry - Gets or reads user entries or values
2. IIdentityManagementService - Manage application groups and memberships.
3. ITeamFoundationJobService
4. IPropertyService
5. IEventService
6. ISecurityService
7. ILocationService
8. TswaClientHyperlinkService
9. ITemaProjectCollectionService
10. IAdministrationService
11. ICatalogService
12. VersionControlServer
13. WorkItemStore
14. IBuildServer
15. ITestManagementService
16. ILinking
17. ICommonStructureService3
18. IServerStatusService
19. IProcessTemplates

Tuesday, September 10, 2013

In today's post we want to cover the TFS client model library in detail. We will describe the hierarchy. The TeamProjectCollection is the root objectand we can instantiate a WorkItemStore with this root object. A WorkItemStore has a collection of projects. WE can look up the project we want by the name. When we find the project by name using the Projects propery of WorkItemStore, the corresponding Project object is returned. This Project item has a set of properties we are interested in. The project has AreaRootNodes that gets the collection of area root nodes. Recall that Area nodes can have path qualifiers so these are tree nodes. The next property is the Categories property and this gets us all the collection of work item type categories that belong to this project.
Take for example, the QueryHierarchy property. This is a replacement to the obsolete StoredQueries type and lets you get all the query items whether they are query folders or query definitions. Note that the query definition files are viewable only. This is there is a property called QueryText that gives you the text of the WIQL stored in these files but no way of executing that wIQL. You can then use this text to instantiate a Query object and invoke the RunLinkQuery. You need both the WorkItemStore and the WIQL to instantiate the Query object.  Thus given a URI to the TFS server, we have programmatically traversed the object model to find the query and execute it. Whenever you make or change the QueryItems in this tree, you can simply save the QueryHierarchy. This will save any changes anywhere in the tree. If you were to look for an item in this tree, you may have to implement a recursive search method that enumerates the contents of the current QueryFolder. However if you have a GUID you can use the find method to retrieve that specific item.There is a FindNodeInSubTree method that can do this recursion and it accepts a lookup based on a specified ID or a path. In most cases, this works well because with TFS workitems when we create, update or delete them in visual Studio, we can get their GUIDs by using copy full path or retrieving the GUID by a previous client object model call. There is a mention for a hash of all names of the items that can be looked via a Item property on the treenode but it doesn't seem to be available with VS2010.
TFS also provides a set of services that can be individually called for access to the resources they represent. You can get a reference to any service using the GetService method and it can take the type of the service you want to use as a parameter. 
TFS client object model provides a nice hierarchy of objects to use to our advantage. For example, you can navigate from the server scope to the project scope and then to the work items scope. And at any level you can enumerate the resources. This provides a convenient mechanism for the length and breadth of the organization.
The client object model is not the only thing. There's the server object model to be used at the server side. and the build process object model at the build machine.
The OData on the other hand gives the same flexibility outside the programming context. It is accessible over the web.
The way to navigate the OData from TFS is to use the OData model to find the workItem we are interested in. Here's how it would look like :
var proxy = CreateTFSServiceProxy(out baseURL, out teamProject, out workitemtype);
var workItems = proxy.Execute<WorkItem>(new Uri(string.Format(cultureInfo.InvariantCulture,
                                                             {0}/Projects('{1}')/WorkItems?$filter=Type eq '{2}' and Title eq '{3}' & $orderby=Id desc",
                                                             baseURL,
                                                             teamProject,
                                                             WorkItemType,
                                                             WorkItemTitle)))
                  .First();
 This lets us specify the filter in the Uri to get the results to we want. Then it can be processed or reported in any way.
or another way to execute the queries could be as follows:
var queryProxy = new TFSQueryProxy(Uri, ICredentials)
var queries = queryProxy.GetQueriesByProjectKey

TFS client object model provides a nice hierarchy of objects to use to our advantage. For example, you can navigate from the server scope to the project scope and then to the work items scope. And at any level you can enumerate the resources. This provides a convenient mechanism for the length and breadth of the organization.

Monday, September 9, 2013

OData and TFS

TFS has a client object model. These are available via the Microsoft.TeamFoundation.Common and Microsoft.TeamFoundation.Client libraries for programmatic access. Using this library a query can be executed by instantiating the query class which takes the store and the WIQL as parameters.
The store can be found as follows:
1) Instantiate the TfsTeamProjectCollection with the Uri for the TFSServer, something like : http://server:port/tfs/web
2) get the work item store from 1) with GetService method
3) get the project from the work item store using the workItemStore.Projects["API"]
The query class represents a query to the work item store. An executed query returns a WorkItemCollection. These and other objects can be browsed from the Microsoft.TeamFoundation.WorkItemTracking.Client.dll  which is available from \Program Files\ Microsoft Visual Studio 10.0\Common7\IDE\ReferenceAssemblies\v2.0  on computers where TeamExplorer is installed.
Authenticated credentials may need to be used with the Team Foundation Server. ICredentials  object can be created to connect to the server. The password is required to create this object. The team foundation server also provides IdentityDescriptors for impersonation which means that you need not use the username and passwords.
Both the Uri and the ICredentials can be passed to the constructor of the TFSConfigurationServer object. The constructor also allows for mixed mode authentication where the credential used to connect to the team foundation identity where authentication and impersonation are both allowed.
Once the TFSConfigurationServer object is constructed, we can drill down to the objects we are interested in using the object model hierarchy or using search queries.
Queries can be executed by navigating to the QueryFolder for a QueryDefinition.
So code looks like the following:
var root = TfsConfigurationServerFactory.GetConfigurationServer(uri, iCredentials);
var projectCollection = server.GetTeamProjectCollection(NameOrGuid);
var store = projectCollection.GetService<WorkItemStore>();
var  teamProject = store.Projects["your_project_name"];
Assert(teamProject != null);
var queryResults = store.Query("your_WIQL_here");
or
var folder = teamProject.QueryHierarchy as QueryFolder;
foreach(var queryItem in folder)
{
// iterate
}
There is another way to get this data.
OData exposes a way to work with this data over the web. It is accessible from any device or application that supports HTTP requests. Think of OData as a web catalog browser of the client object model. For example, if you could enumerate some work item types with the client object model, then you can view them in a browser with OData. Scripts and programs can now work off of http requests.

Saturday, September 7, 2013

One of the tests for an OAuth provider could be to use the access tokens with the API that uses a user information as a parameter. If none of the APIs use user parameter, and only the access token, this test does not apply.however, using the user parameter for the current user whose access token has been retrieved should work. And that of another user with the same access token should not.
Same user could sign in from multiple clients. Since, the tokens once issued are valid for usually a duration of an hour, the provider does not limit the number of calls made in that duration. For the same reason, number of clients used by the user should not be limited. since the API's in discussion are stateless, the number of such calls doesn't matter. That said, a single client may be hogging the provider. Throttling in both cases could be helpful but should not be done on user or client basis but could be done on an API by API basis. This is one of the functions of the proxy provider. Authorization denials are severe measures and should generally be a last resort.
Also, performance test for an OAuth provider is important. If the existing load tests cover a authorizations and revokes from a user on a client repeated say thousand times in one minute and five minute intervals, it should work. Token expiry time is not specified by the user. So a test that involves a revoke prior to a re-authorization should work as an equivalent. Load test could have a variation of the different grant types for authorization. The functional tests or the build verification tests cover these and they might have other tests that could be thrown into a mix. However, single authorization and revoke of a token should be targeted in a separate run if possible. This should involve the authorization grant type that is most common. The test run that includes other kinds of tests could not only include the hand picked from the existing test cases but also include a capture of peak load traffic from external clients.
Tests could also target authorization denials from the API provider and the proxy independently. This should be visible from the responses to the authorization requests. The server property carries the source of the response. This is useful to know whether the token in invalidated because of a invalid user or an invalid client. The status code testing is not enough. Error message if any should also be tested. In the case of OAuth, providing error message that mentions an invalid user or an invalid client could be helpful. A common error message is a developer inactive message. This one is interesting because there seems to be an activation step involved.
Tests could cover spoofing identity, tampering with data, repudiation, information disclosure, denial of service and elevation of privilege.
One of the weakness of this mechanism is that the APIs have to comply in a certain way. For example, none of the APIs should expose the userId parameter. If APIs expose a user parameter, it should be enforced with an alias for client usage even if those aliases are translated internally. Separation of the user parameter for API from the security mechanism that validates the user is important because security is generally considered a declarative aspect and not the code of the API.
If the two were tied together where the user information for the API is looked up via security tokens translation in the API implementation instead of outside it as a parameter, each API requiring that may need to do the same. Instead, it is probably more convenient to maintain a list of API secured by privileged access tokens. For example, if an endpoint is marked internal, it should be enforced. It could enforce by making sure that the callers are internal or that it is packaged in an assembly that is not exposed, etc. Test should verify that all APIs are marked for use only with an access token even if      the tokens are not user privileged.