Cluster computing

Tuesday, October 1, 2013

Integer vs GUID columns for identifying users and clients in OAuth
When considering whether to have Integer Identity columns or GUID identity columns or both, I wanted to reflect on what others have been discussing in general about integer versus guid columns.
First integer columns have an upper limit. If you are likely to exceed, pick GUID
Second integer columns are sequential and indexed. If all your data is internal and no other system is reading or writing to it based on the Id, then integer is a good choice. By
sequential, you mean that inserts are typically happening at the end whereas GUID inserts can happen. Sequential also means indexed and therefore improved performance.
If data is to be merged or replicated or synchronized with other systems, consider GUID for randomness and uniqueness.
Third if you are giving an ID out, give out GUIDs. If you are keeping it internal use integer
Fourth, GUID makes ID based security attacks more difficult
Fifth, if all things are equal, consider Id for readability.

Next I want to discuss obfuscation of userId and clientId. Both integers for userId and clientId could yield the same encrypted string over and over again If that becomes the substrings in each token such that they repeat between tokens, it will become easier to crack the userId and clientId. There are many obfuscation techniques but the ability to decrypt back the original string should not be lost otherwise the tokens will get compromised. Given these the hhmmss is a good addition to the original string so that the encrypted string varies each time. Just that the granularity of the time for tokens to be issued is now 1 second. Given that there are 3600 seconds where each second a million users might use the same client to login, the performance required is almost a single login per micro second. This requires performance comparision between decryption or database calls. Besides the hhmmss could be guessed unless a company specific magic number is also added to the remaining portion of the integer in which the hhmmss is stored.
Lastly I want to discuss the tradeoff between using a token table versus token encryption or decryption from a performance standpoint. The database calls can scale. Encryption and decryption will require space and time but can be done in memory. A token table is also not expected to grow to billions of records. Tokens will be inserted sequentially and in a forward only manner while cleanup happens on the other end where tokens expire. The cost to lookup can be minimized with the help of indexes based on clientId and userId. Many attributes can be added to the token in a token table versus the candidate for encryption where space is a premium. So a token table certainly has a lot of advantages and follows well known model for usage.

Cluster computing

Tuesday, October 1, 2013

No comments:

Post a Comment