Cluster computing

Wednesday, October 2, 2013

In the previous post I summarized an implementation that almost exclusively relies on encrypting and decrypting tokens without having the need to keep tokens in a token table. That is great but it will require versioning and backward compatibility whenever the token definition changes. We can attempt to capture all required fields for encryption to build a token but its the protected resources that can require different privileges overtime. Therefore there may be variations to the scope parameter that we may not be able to anticipate. Besides, there might be revisions to the spec that we can't anticipate. If we have to do backward compatibility with the API, there's more than one artefact to take care of. Instead, we could keep this flexible with a hash for a token that we maintain attributes on.
There's very few things we need to do in addition to what we discussed to have a token table. In this post, let us summarize the points for consideration with an implementation that does not encrypt/decrypt tokens but relies exclusively on looking up the hash of a token in a token table.
Here we will consider all improvements to the database and treat the security mechanism as equivalent to an API implementation that uses a database to store and lookup tokens.
First, we will need to populate tokens from all workflows. This means we don't leave out tokens from any token granting endpoint
Second, we look up the token as an action filter so that API calls will not need to take the call where validation fails.
Third, the token tables should not keep tokens that have expired. A background periodic purge is good enough.
Fourth, the token tables need to have user and client ids as sequential integers in addition to the guids, if any. The guids may help correlate the token table with the user profiles elsewhere.
Fifth, RBAC access and row level security can be enhanced with the token table and scope parameter mapping. The scope can very simply translate to row level labels applied to different resources.
Sixth, some columns may need to be added to existing schema in other databases for this improved and enhanced security mechanism.
Seventh, the client and user id columns should be indexed for lookup of token based on client and user.
Eighth, special case client and user may need to be supported for scenarios such as where resources are accessed with guest access.
Ninth, Existing table upgrade should only involve addition of columns for security mapping
Tenth, the token hash may need to be random, hex or base64 encoded and should be indexed as well.
Lastly, client registrations should flow into corresponding client table.

To summarize, we have decided so far that
1) we can have a token table to store and lookup tokens that we issue OR we can generate tokens where the tokens bear user, client, time of issue and bearer status.
2) we can use encryption and decryption effectively to generate tokens. We could keep the encryption or decryption logic in the database.
3) Tokens that we generate will be tamperproof, cannot be faked and will have all the security information required to handle the requests.
4) Tokens can have fixed length strings for user, client, time of issue and bearer status, preferably integers
5) On decryption, the client Id in the token will be validated against the apikey of the caller to ensure that the caller is the authorized one.
6) The protected resources of the client should match the user information from the token.
7) These security validations happen as filters to the API calls so that the calls themselves don't have to implement any validation logic.
8) Audit trail of all token grants and calls can be made by intercepting traffic on the server side. URI logging may not work since bearer tokens may be passed in the request body.
9) Tokens are expected to be short lived such as for an hour so clients are forced to request again.
10) Token granting and validations should be performance. Choice of technology should not degrade performance. In the bigger picture, the overall cost of token validation should not exceed one tenth the time taken in network delays and hops to reach the authorization server.
11) User and client information from token is independent from the userId and clientId in the API resource qualifier. The APIs could choose to shorten it
12) token expiration could be validated as well
13) token with bearer flag will skip client validation
14) token should include some kind of magic number from authorization server
15) Client registration could be handled as part of webUI.
16) Client registration information could keep track of redirect uri
17) Authorization code could be distinguished from the token by a flag.
18) Code to token translation can be permitted if code is validated just the same way as tokens were decrypted and validated
19) Token granting can be based on any workflow but the token itself does not need to have any of the information about the grant type. Special userId and clientId will be required for cases when userId is missing or the token is permitted by different client types.
20) Tokens once issued is generally not expected to be issued again.
21) By tokens we are referring to the access_token parameter in the above and its length should preferably be within 50.
22) New workflows could be created or existing workflows can be used for generating or using the same token.

Tuesday, October 1, 2013

Integer vs GUID columns for identifying users and clients in OAuth
When considering whether to have Integer Identity columns or GUID identity columns or both, I wanted to reflect on what others have been discussing in general about integer versus guid columns.
First integer columns have an upper limit. If you are likely to exceed, pick GUID
Second integer columns are sequential and indexed. If all your data is internal and no other system is reading or writing to it based on the Id, then integer is a good choice. By
sequential, you mean that inserts are typically happening at the end whereas GUID inserts can happen. Sequential also means indexed and therefore improved performance.
If data is to be merged or replicated or synchronized with other systems, consider GUID for randomness and uniqueness.
Third if you are giving an ID out, give out GUIDs. If you are keeping it internal use integer
Fourth, GUID makes ID based security attacks more difficult
Fifth, if all things are equal, consider Id for readability.

Next I want to discuss obfuscation of userId and clientId. Both integers for userId and clientId could yield the same encrypted string over and over again If that becomes the substrings in each token such that they repeat between tokens, it will become easier to crack the userId and clientId. There are many obfuscation techniques but the ability to decrypt back the original string should not be lost otherwise the tokens will get compromised. Given these the hhmmss is a good addition to the original string so that the encrypted string varies each time. Just that the granularity of the time for tokens to be issued is now 1 second. Given that there are 3600 seconds where each second a million users might use the same client to login, the performance required is almost a single login per micro second. This requires performance comparision between decryption or database calls. Besides the hhmmss could be guessed unless a company specific magic number is also added to the remaining portion of the integer in which the hhmmss is stored.
Lastly I want to discuss the tradeoff between using a token table versus token encryption or decryption from a performance standpoint. The database calls can scale. Encryption and decryption will require space and time but can be done in memory. A token table is also not expected to grow to billions of records. Tokens will be inserted sequentially and in a forward only manner while cleanup happens on the other end where tokens expire. The cost to lookup can be minimized with the help of indexes based on clientId and userId. Many attributes can be added to the token in a token table versus the candidate for encryption where space is a premium. So a token table certainly has a lot of advantages and follows well known model for usage.

I mentioned in my previous post, the token should capture additional information for the token such as whether it's a bearer token or not. Today I want to discuss the other attributes of the token such as scope and state that may also need to be applied during policy resolution and a way to look them up. We mentioned that the scope and the state are not used for validating the client so they need not be part of what makes the token. However, applying scope to users protected resources is also relevant to security policy implementation. By default the scope may not be specified or set and the presence of a valid token could imply access to all of user's protected resources. In such cases, there is not scope resolution required. In the cases where scopes are specified and enforced, we need a mechanism that facilitates APIs to be called due to the presence of a token but fails when the required resources are outside of the scope of the token. we will discuss the scope storage/lookup and scope enforcement separately.
The security mechanism we discussed earlier is a layer above the API because it filters the calls to the API without the API knowing about it. During the calls, the user's resources are protected with row level security. This is done with labels on users protected resources. Labels correspond directly to the scope.
Scope storage and lookup is facilitated by token table. This doesn't mean scope labels cannot be included in the token itself.

Monday, September 30, 2013

In my OAuth implementation, I talked about token provisioning for user and client by the OAuth server. I also talked about varying the token over time for calls by the same client on behalf of the same user. I would now like to describe the OAuth bearer token and its provisioning in my proposed OAuth implementation.
Bearer tokens are not the 'default' tokens. These are special tokens issued so that any source can access any protected resource. It is merely used to encapsulate user authorization. As long as they are transmitted securely and are tamper proof and with some other considerations discussed earlier, they can supposedly be used by any client to request user's protected resources. This is merely a step to make it easy for user authorizations to be passed around for more reachability. In all the usages of the bearer token, the client must authenticate the server before sending the token over. The tokens are supposed to be handled and passed around with care since presumably anybody can use it. I consider the bearer tokens as a simple relaxation of client validation and using the user authorization only to access protected resources.
Therefore in my proposed OAuth provider implementation, I add the check to bypass client validation when the token is a bearer token.
This implies that we need to add additional information to our initial userId and ClientId information that we encrypt to call it a bearer token. The way we store this information in the token is entirely up to us. We could choose to set a bit or a digit to indicate a bearer token, or add a column to our token table among others such that we can lookup the token to see if its a bearer token. Based on my previous post, I want to add the bearer token information to the original string before I encrypt it. This way I have that information handy when I decrypt it.
Note that I do not consider other token attributes such as redirect uri, scope, state etc in the original information because the only thing I'm validating is the client. Should the client validation be bypassed, I would like that information in the decrypted string.
Next let us consider how to specify the bearer token in the original string before encryption. One way to use this could be to use a special delimiter on the highest place order in the clientId.

Another way would be to specify even number for the rotation of the userId and clientId

Yet another way could be to add a bearer token as a separate byte between user Id and client Id.
Yet another way could be to zero out the client Id to say that anyone can use it or special case the client Id.

Thus there are many ways in which the bearer token can be specified in the original string.

No matter how we choose to capture the bearer information, we have to add a check to the client validation to be bypassed when its a bearer token.

In this post, I want to talk about Application Object in Microsoft Excel, that lets you work with Excel sheets programmatically.
You could write code such as follows:
Application.Windows("book1.xls").Activate
or
Set x1 = CreateObject("Excel.sheet")
x1.Application.Workbooks.Open "newbook.xls"

You also have a PivotTable and PivotChart object in the Excel Object Model.
They facilitate pivot transformation of data. You could use the PivotTableWizard to generate the PivotTable.
ActiveSheet.PivotTableWizard(xlDatabase, Range("A1:C100")

Sunday, September 29, 2013

In the previous post, we discussed generating different tokens for the same user and client over time. We could do this based on a variant we add to the userId and clientId before encryption.
This was made possible with a hhmmss integer that we append to the UserId and ClientId.
This had the benefit that we used fixed length string to encyrpt and decrypt. However all of the original string may be known to the user. So if we want to keep some part of the original string unknown, we could add a magic number.
All of this could be avoided if we used an offset to rotate the userId+clientId string based on say hhmmss. The logic to translate the hhmmss to an offset thats within the bounds of 0 to the length of the original fixed length string is entirely up to the server.
We also mentioned that with the OAuth server, we preferred to keep a token table that gets populated for all OAuth token grants. At the same time, we said that the APIs that validate the tokens need not rely on the token table and can directly decrypt the tokens to identify the userId and the clientId. The table is only for auditing purposes at this point.
The token validation does not need to happen within the API implementation although APIs could choose to do so. That validation could be done with ActionFilterAttributes we discussed earlier. or even via HTTP handlers. The URI and query string could still be passed to the API implementation.
The id conversion may need to occur if the APIs would like to get the userId and clientId from the token itself so that the API resources do not require a user or client in the resource qualifiers. This is because the id is integer based. If the earlier implementation was based on GUIDs, the id and GUID for the same user or client may need to be looked up.
APIs are not expected to do away with user or client id since the token is a security artifact and not a functional artifact. To the user it could be rendundant or additional task to provide user and client information alongwith a token. To the API implementation, irrespective of the caller, the userId and clientId could be parameters so that callers can look up the information when the parameter values change.
That said, most of the resources are based on user profiles and has nothing to do with client profiles. If the users are already being aliased so that they don't have to enter their userId, then the resource qualifiers for the API implementations can certainly choose to not require userId and clientId. This will benefit the users who call the API.
I mentioned earlier that on mobile devices where text entry and input is generally difficult at this time, it is better to require less when the users have to specify the API directly.
Lastly, the database schema may need to be modified if ID parameter is not already what is currently being proposed.