Cluster computing

Tuesday, September 17, 2013

Today I tried out the enterpriselibrary.data framework and it was a breeze. To those familiar with entity framework this provides more streamlined access to data. For example, you could wire up the stored procedures results to the collection of models you define so that you can work with them instead of the entire object graph. There is a lot of debate around the performance of entity framework and perhaps in earlier blogs, I may have alluded to the different levers we have there to improve that. However, the enterpriselibrary comes with the block pattern that these libraries have become popular for. Blocks are reusable patterns across applications so that your development time is cut down and it comes with the reliability and performance that is come to be known with these libraries.
I want to bring up the fact that we associate the database by using a convenient DatabaseFactory.CreateDatabase method to work with the existing databases in the sql server. Some of the data access extensions may need to be written to translate the datareader columns to the objects and this helps because you can translate the results of the stored procedure execution directly into a collection of the objects you have already defined as models without the onus of the object graph.
In addition, there is no configuration sections involved in the config files and the assemblies can be installed and added to the solution using the visual studio nuget package manager.

OAuth bearer tokens practice

OAuth bearer tokens may currently be passed in the URL but the RFC seems to clearly call out that this should not be done. Therefore, checks and other mechanisms to safeguard these tokens should be in place. As an example, this parameter could be passed into the request body. or the authorization server may handle client validations. More is based on implementation.
In general, if the server and the clients communicate via TLS and they have verified the certificate chain, then there is little chance of token falling in wrong hands. The URL logging or https proxy are still vulnerabilities but the man in the middle attack is less of an issue if the client and the server exchange session id and keep track of each other's session id. As an API implementation, session Id's are largely site or application based and not the APIs concern but its good to validate based on session id if such is available.
Sessions are unique to the application. Even the client uses refresh tokens or re-authorizations to keep the session alive. At the API level, if the sessions were kept track of, it would not be tied to the OAuth revokes and re-authorizations, hence relying on session id alone is not preferable. At the same time, using session id as an additional parameter to confirm along with each authorization helps tighten security. It is safe to assume the same session prevalance until the next authorization or an explicit revoke. By tying the checks exclusively to the token, we keep this streamlined to the protocol.
OAuth can be improved upon but it certainly enables redirections that make it easier for the user. In addition, the use of expiry dated tokens enable clients to reduce the chat with the authorization server.
In addition, many applications can now redirect to each other for same user authorizations. So the user has to sign in far lesser than before. If the user is signed in to a few sites, he can use the existing signed in status to gain access to other sites. This is not just a mere convenience to the user, it enables same user to float between sites and also enables applications to integrate and share user profile information for a richer user experience.

Monday, September 16, 2013

Tests for the client validation changes include the following
1) specify one client based token grant and access by another client
2) specify token grant to one client and revoke by same client and reuse of a revoked token by the same client
3) specify token grant to one client and revoke by a different client
4) specify token grant to one client, revoke by a different client, and reuse by the original client
5) specify low privileged token grant to one client, specify high privileged token grant to same client, use of both tokens by the same client
6) specify low privileged token grant to one client, access low privileged token by another client
7) specify user privileged token grant to one client, specify token grant by same user to another client, clients exchange token
8) specify user privileged token grant to one client, specify token grant by different user to same client, client swaps token
9) specify user privileged token grant to one client, specify client to request several tokens until a large number.
10) specify user privileged token grant to multiple clients until a large number of clients reached
11) specify user privileged token grant and revoke to same client a large number of times

Delegated tokens or bearer tokens
The RFC makes special provisions for bearer tokens. Bearer tokens can originate from any source to access any resource protected by these tokens. Therefore they should be stored and transmitted with care.
For example, these tokens can be sent in the following ways:

1) When the access token is sent in the authorization header in the http request, a predefined syntax is used which takes the form "Bearer 1*SP b64token" where b64token is base64.
As an aside a base64 string consists only one occurrence of any given Alpha or digit and / or one occurrence of -, ., _, ~, +, /, = special characters.
2) The bearer token could be sent in the request body with the "access_token" using "application/x-www-form-urlencoded"
3) The URI query parameter could also include the "access_token=" however it should be sent over TLS, along with specifying "Cache-Control" header with the private option.
Since URIs are logged, this method is vulnerable and is discouraged by the RFC. It documents current usage but goes so much as saying it "SHOULD NOT" be used and it goes against a reserved keyword.

If the request is authenticated, it could be responded with error messages such as invalid_request, invalid_token, and insufficient_scope as opposed to not divulging any error information to unauthenticated requests.
Threats can be mitigated if the
1) tokens are tamperproof
2) tokens are scoped
3) tokens are sent over TLS
4) TLS Certificate chains are validated
5) tokens expire in reasonable time
6) token exchange should not be vulnerable to eavesdropper
7) Client verifies the identity of the resource server ( this is known as securing the ends of the channel)
8) tokens are not stored in cookies or passed as page URLs

In this post, we talk about client registrations. OAuth mentions that clients be given a set of credentials that they can use to authenticate with the server. This is much like the user name password except that the client password is Base64 encoded and is called client secret. The client id and secret are issued at the time of registration. Therefore the authentications server which also has the WebUI could host this site and thereby reduce the dependency on the proxy. Besides, this will integrate the developers with the regular users of the site.
Every information that the client provides is important. Also, the access token that is issued has some parameters we talked about earlier such as scope, state etc. However, one field I would like to bring up in this post is the Uri field. This is supposed to be the redirection uri and state from the client. This is seldom used but is a great way to enforce additional security.
In the list of things to move from the proxy to the provider, the token mapping table, the validation for each api to ensure the caller is known and the token is the one issued to the caller, the checks for a valid user in each of the authorization endpoints where user authorization is requested. etc are some of the items.
WebUI redirection tests are important and for this a sample test site can be written that redirects to the OAuth WebUI for all users and handles the responses back from the WebUI. A test site will enable the redirects to be visible in the browser.
The test site must test the webUI for all kinds of user responses to the OAuth UI in addition to the testing of the requests and responses from the WebUI.
WebUI testing involves a test where the user sees more than one client that have been authorized. Updates to this list is part of webUI testing, therefore the registration and removal of apps from this list have to be tested. This could be done by using different clientId and clientSecret based authorization requests to the server. The list of clients will come up in html so the html may have to be parsed to check for the names associated with the different clientIds registered.
Lastly, webUI error message handling is equally important. If the appropriate error messages are not provided, user may not be able to take the rectifiable steps. Moreover, the WebUI properties are important to the user in that they provide additional information or self help. None of the links should be broken or mis-spelled on the webUI. The WebUI should provide as much information about its authenticity as possible. This way it will provide additional deterrence against forgery.

Sunday, September 15, 2013

The post involves discussion on APIs to remove all user/me resource qualifiers from the API config routes. If the OAuth implentation doesn't restrict a client from using the notion of a superuser who can access other user profiles based on /user/id that would mean the protocol is flexible.
Meanwhile, this post also talks about adding custom validation via ActionFilterAttributes
For performance, should we be skipping token validation on all input parameters.
This is important because it lowers security in favor of performance and the tradeoff may have implications not just to the customer.
That said even for the critical code path, security has to be applied to the both the endpoints administration as well as token granting endpoints.
The token granting mechanisms also need to make sure the following are correct.
1) the tokens are not rotated or reused again.
2) the tokens hash is generated using the current timestamp.
3) the tokens hash should not be based on userId and clientId.
Should the tokens be encrypted, then they could use userId, clientId so that they can be decrypted.
The third post will talk about client registrations separately since they are currently tied to the proxy and is not in immediate scope.

In this post, we will describe the implementation in a bit more detail.
First we will describe the database schema for the OAuth.
Then we will describe the logic in the controllers that will validate and filter out the bad requests and those explicitly prohibited.
Then we will describe the tests for the webUI Integration. All along the implementation we will use the features available from the existing proxy and stage the changes needed to remove the proxy.
First among these is the schema for the token table. This table requires a mapping for the userId and the clientId together with the issued token. This table is entirely for our compliance with OAuth security caveats and hence user, clients and proxy are unaware of this table and will not need to make any changes on their side due to any business rules associated in this table as long as they are OAuth compliant. Since the token is issued by the proxy, we will need to keep token request and response information in this table. In addition, we will record the apiKey and clientId from the client along with the token even though the proxy may be enforcing this already. (Note clientId is internal and apiKey is public and they are different.) This as described in the previous post helps us know who the token was originally intended for and whether, if any, misuse occurs by a third client. And we will keep the user mapping as optional or require a dummy user since some clients may request credentials only to access non-privileged resources. It could be interesting to note that the userId is entirely owned by our API and retail company but the check for an access token issued on behalf of one user to be used with only that user's resources is currently enforced by the proxy. That means the proxy is keeping track of the issued token's with the user context or is passing the user context back to the API with each incoming token.
But we have raised two questions already. First - Should we treat user context as nullable or should we default to a dummy user. Second - Does the proxy pass back the user context in all cases or should the API implement another way to lookup user given a token for non-proxy callers ?
Let us consider the first. The requirement to have a non-nullable field and have a default value for client credential only calls, is certainly improving validation and governance. Moreover, we can then have a foreign key established with the user table so that we can lookup user profile information directly off the token after token validation. This leads the way to removing the ugly "/user/me" resource qualifier from all the apis that access user privileged resources. The userId is anyways internal usage only so the APIs look cleaner and we can internally catalog and map the APIs to the type of access they require. This means having another table with all the API routes listed and having classified access such as user privileged or general public. This table is not just an API-Security table but also provides a convenient placeholder for generating documentation and checking correctness of listings elsewhere. Such additional dedicated resources for security could be considered an overhead but we will try to keep it minimal here. Without this table we will assume that each API internally applies the UserId retriever ActionFilterAttribute to those that require to access privileged resources and the retriever will apply and enforce the necessary security.
We will also answer the second question this way. The proxy provides user information via "X-Mashery-Oauth-User-Context" in the request header. This lets us know that the token has been translated to a user context and the proxy has looked it up in a token database. This token database does not serve our API Security otherwise we would not be discussing our schema in the first place. So lets first implement the schema and then we will discuss steps 2 and 3.

Saturday, September 14, 2013

Improvements to OAuth 2.0 if I could venture to guess:
1) Tokens are not hashes but an encyrption provided in request and response body. Token is an end result representing an agreement between user, client, and one or more parties. A given token could be like a title on an asset immediately indicating the interests vested in the asset - be it the bank, an individual or third party. However the token must be interpretable to the authorization server alone and nobody else. It should be opaque to both the user and the client and understood only by the issuer to repudiate the caller and the approver. The client and the user can use their state and session properties respectively to repudiate the server. In both cases, this has nothing to do with the token. Such a token processing obviates the persistence of tokens in a store for lookup by each API call. This moves the performance overhead from storage and cpu to mostly cpu based which should be welcome on server machines. That said, almost every api makes a call to a data provider or database and yet another call to associate a hash with a user and a client is not only simpler but archivable and available for audit later. Test will find this option easy, more maintainable and common for both dev and test. This option also scales up and out just like any other api call and since its a forward only rolling population, there is possibility to keep the size finite and even recycle the tokens. The internal representation of user and client has nothing to do with the information exchanged in the querystring so the mapping is almost guaranteed to be safe and secure. This approach also has conventional merits of good bookkeeping with the database technologies such as the ability to do change data capture, archivals, prepared plan executions etc. In the encryption based token scenario, the entire request and response capture may need to be taken for audit and then parsed to isolate the tokens and discover their associations. Each discovery may need to be repeated over and over again in different workflows. Besides an encrypted string is not easy to cut and paste as is the simplicity of a hash and the cleartext on the http. That said, encryption of request parameters or for API call signatures are anyways used in practice so the adoption of an encryption based token should not have a high barrier to adoption. Besides, the tokens are cheap to issue and revoke.
2) Better consolidation of the grant methods offered. This has been somewhat alluded to in my previous post and it will simplify the endpoints and the mechanisms to what is easy to propagate. For example, the authentication code grants and the implicit code grants need not be from an endpoint different from the token granting endpoints since they are semantically the same. Code to token and refresh could be considered different but at the end internal and external identifiers will anyways be maintained be it for user, client or token. Hence, the ability to treat all clients as one and access tokens with or without user privileges and all requests and response as pipelined activities will make this more streamlined. Finally, an OAuth provider does not need to be distributed between the proxy and the retailer. In some sense, these can all be consolidated in the same stack.