Friday, February 12, 2021

Generating hashes

 Generating hashes:

Hashes generation is an in important step in the versioning and signing stages of software development and release. A hash is a fingerprint for the associated artifact. With this identifier, the artifact can be tracked or referenced along with the guarantee that if the content changed, the hash would also change. The integrity of the file is therefore manifested with the hash. 

Popular forms of hashes are Message Digest (MD5) and Secure Hash Algorithm (SHA).  SHA-1 hashes are 160 bits or 20 bytes long. It comprises of hexadecimal numbers 40 digits long. The message digest is like the Rivest design for MD4 MD5. Take 5 blocks of 32 bits each, unsigned and in Big-Indian. Then do a preprocessing to the message. Append the bit 1 to the message. Append a padding of up to 512 bits so that the message aligns with 448. Append the length to the message as an unsigned number. Then do the processing on successive 512-bit chunks. For each chink, break the chunk into sixteen 32-bit big endian words. Initializes the hash value for this chunk as h0, h1, h2, h3 and h4. Extend the sixteen 32-bit words into eighty 32-bit words this way: for each of the 16 to 79’th word, XOR the word that appears 3, 8, 14, and 16 words earlier and then left rotate by 1. Initialize the hash value for this chunk as set of 5. In the main loop for i from 0 to 7, for each of the four equal ranges, apply the ‘and’ as well as the ‘or’ to the chunks in a predefined manner specified differently for each range. Then recompute the hash values for the chunks by exchanging them and re-assigning the first and left rotating the third by 30. At the end of the look, recompute the chunk's hash to the result so far. The final hash value is the appending of each of these chunks. Keyed MD5 produces a cryptographic checksum for a message as m + MD5(m + k). Another popular form of hashes is used with certificates. A certificate is a document with a digital signature and is signed by a Certification Authority. Certificates enable public Key Authentication that happens with A sending E (x, Public-B) to B and B sending back the decrypted x. 

Like Versioning using hashes, signing is the process by which a digital signature is created from the file contents. The signature proves that there was no tampering with the contents of the file. The signing itself does not need to encrypt the file contents to generate the signature. In some cases, a detached signature may be stored as a separate file. Others may choose to include the digital signature along with the set of files as an archive. The signature differs from the fingerprint or hash in that there is decryption involved which allows the content to be irrefutably come from the purported origin. Signing uses a private-public key pair to compute the digital signature. The private key is used to sign a file while the public key is used to verify the signature. The public key can be published with the signature or it can be made available in ways that are well-known to the recipients of the signed files. 

The process of signing can take one of many forms of encryption methods. The stronger the encryption the better the signature and lessen the chances that the file could have been tampered. The process of signing varies across operating system. 

Git popularized the use of ‘gpg’ tool to sign and verify the files. This tool even generates the key-pair with which to sign the files. The resulting signature is in the Pretty Good Privacy protocol format and stored as a file with extension .asc. Publishing the public key along with the detached signature is a common practice for many distributions of code. GitHub solves universal versioning and is ubiquitous in its adoption across companies. There is even a translation from Git logs to Semantic Versioning 2.0.0 where the former captures the changes to the versions in a workspace of files while the latter refers to the standardization of convention for the meaning and form used to represent versions. GitVersion is a tool that can translate the individual versions to the semantic versioning. Together with the ability to make unique, universal version, to give every change a version and to bridge the gap to translate those versions to a form acceptable as compliant with Semantic Versioning standards, GitHub seems like a one-stop shop for versioning, tracking and building a management system around changes.

With versioning and signing becoming a standard by the vendors for source control, object store and artifact repository managers, applications can choose to offload these activities to stacks that can lower their Total Cost of Ownership. 


No comments:

Post a Comment