In this post, I am sharing an easy way of generating a checksum of arbitrary text or content of a file in Node.js.

The checksum (aka hash sum) calculation is a one way process of mapping a large data set of variable length (e.g. message, file), to a smaller data set of a fixed length (hash). The length depends on a hashing algorithm.

Note that one way process means it is not possible to perform a reverse calculation i.e. to compute input data (message, file etc.) out of the checksum value. Even though it is possible to find a message which produces the same checksum (for example using rainbow tables), you will never know if the message is identical to the original one or not.

For checksum generation we can use node crypto module. The module uses createHash(algorithm) to create a checksum (hash) generator. The algorithm is dependent on the available algorithms supported by the version of OpenSSL on the platform. Some examples:

  • md5 for MD5 message-digest algorithm
  • sha1 for SHA-1 is a cryptographic hash function

In order to get a list of all available hash algorithms you can use crypto.getHashes().

var crypto = require('crypto');

crypto.getHashes() // [ 'dsa', 'dsa-sha', ..., 'md5', ... ]

A simple method generating checksum value form static input:

var crypto = require('crypto');

function checksum (str, algorithm, encoding) {
    return crypto
        .createHash(algorithm || 'md5')
        .update(str, 'utf8')
        .digest(encoding || 'hex')
}

checksum('This is my test text');         // e53815e8c095e270c6560be1bb76a65d
checksum('This is my test text', 'sha1'); // cd5855be428295a3cc1793d6e80ce47562d23def

You can also calculate checksum of a file content using following approach. Note that this approach should be used for small files only. Large files should be handled in a different way, which I will describe shortly.

var crypto = require('crypto'), fs = require('fs');

// checksum function definition as above
// Note that content of the test.dat file is "This is my test text"

fs.readFile('test.dat', function (err, data) {
    checksum(data);         // e53815e8c095e270c6560be1bb76a65d
    checksum(data, 'sha1'); // cd5855be428295a3cc1793d6e80ce47562d23def
});

Let's now check how to handle big files. And how big is big? That obviously depends on the context. Sometimes it might be a few MB and sometimes it might be one GB.

Code snippet is as follows:

var hash = crypto.createHash('md5'), 
    stream = fs.createReadStream('mybigfile.dat');

stream.on('data', function (data) {
    hash.update(data, 'utf8')
})

stream.on('end', function () {
    hash.digest('hex'); // 34f7a3113803f8ed3b8fd7ce5656ebec
})

Note that the hasher (checksum generator) is updated with every chunk of data coming from the file stream (data event) and digest is generated when all the stream data has been consumed (end event).