Demystifying Merkle Trees: A Step-By-Step Implementation Guide With Theory

Demystifying Merkle Trees: A Step-By-Step Implementation Guide With Theory

INTRODUCTION

Hello everyone in this article I will simply explain to you all about the Merkle tree as well as how we can implement the Merkle tree. First, I will explain to you some basic things about the Merkle tree then we will move to the implementation part. I tried to explain every line of code so that you all can understand it simply. So let's get started.

WHAT IS THE MERKLE TREE DATA STRUCTURE

First, We need to understand three minor topics that are "What is the Merkle tree?" "What is Merkle root?" and the last thing "What is Merkle proof?"

MERKLE TREE

A Merkle tree also known as a hash tree is a cryptographic data structure used in computer science and blockchain technology. It's a tree structure where every leaf node("Ta, Tb, Tc, Td, ....., Th") represents a data block, and every non-leaf node("Ha, Hb, Hc, Hd, ....., Hh") is a hash of its child nodes' hashes. The top node("Habcdefgh"), known as the root hash, represents the entire dataset. Merkle trees are efficient for verifying data integrity and facilitating quick, secure data retrievals. In blockchain, Merkle trees enable efficient verification of transactions and the immutability of the entire chain by summarizing data in a compact and tamper-evident manner.

This is how Markle tree look like.

In this above sample image, you can see what the actual Merkle tree data structure looks like. As you can see "Ta, Tb, Tc, Td, ....., Th" in this image, this is a transaction that we did. Above the transaction, you can see another layer "Ha, Hb, Hc, Hd, ....., Hh" This is a hash of the below transaction that we did. Now the important points come. As you can see above there are three more layers so in the next layer we can see "Hab, Hcd, Hef, Hgh" This is the hash of the child hashes. When we concatenate Ha & Hb and find the hash of the concatenation form then we get Hab, When we concatenate Hc & Hd and find the hash of the concatenation form then we get Hcd, and this step same for the other hashes (Hef, Hgh).

Note the size of the transactions is not fixed. In my case there are only 8 transactions, this is only for explanation purposes.

In the case of "Habcd", we concatenate "Hab & Hcd" and find the hash using the sha256 hashing algorithm and this step same for the "Hefgh" case. In the end when we concatenate "Habcd & Hefgh" and find the hash of it then we get the "Habcdefgh" hash. This is called root hash.

MERKLE ROOT

A Merkle root is the top node/hash of the Merkle tree. In our case, the hash " Habcdefgh " is the root node.

MERKLE PROOF

A Merkle proof is also known as a Merkle path. The Merkle proof is a structure that holds the minimum required hashes/nodes of the Merkle tree for proving a specific hash in the tree. In our case let's say we want to check whether Ha belongs to Habcdefgh root or not. Then our Merkle proof holds an array of hashes that need to prove Ha belongs to Habcdefgh like [Ha, Hb, Hcd, Hefgh]. We only require these four hashs to check whether Ha belongs to Habcdegh or not. Now you think" How is it possible and why do we only need these four hashs? What about the other hash? ". The reason behind this is if we have Ha & Hb the throw these hashes we can construct the Hab hash, right now we have the Hab hash but to construct the Habcd hash we need the Hcd hash that's why we take Hcd instead of Hab. Now we have Habcd hash because we construct the hash from Hab & Hcd, but we need Hefgh to construct the root hash Habcdefgh that's why we take the Hefgh hash in our array. Now I hope you all get the idea of what I'm trying to say.

IMPLEMENTATION IN NODE JS

After too much theory part we finally came to the implementation part. I will try to explain the implementation part as much as simple by which you can understand. First, we import the crypto module because this module provides sha256, random bytes generator and other functionality that we will use in our Merkle tree implementation. The line of code that we write to import the crypto module is like this:-

import crypto from 'crypto';

Now we need to define two variable that is used for direction purposes(left & right). Why do we need this direction variable? because as I already told you about the Merkle proof we need the least hash to prove a particular hash is a part of this root hash. As you can see in the above image the pair Ha & Hb, Ha is on the left side and the Hb is on the right side. This is the same for the pair Hc & Hd, Hc lies on the left side and the Hd lies on the right and the list goes the same for layers 1,2,3, ..., n. This helps us to find what direction hash we need left or right. If we have the right hash then we need the left hash simple.

const Left = 'left';
const Right = 'right';

You see the Ha, Hb..... in the image but you might be wondering How the actual hash looks like. Don't worry now I will show you the actual hash. Here I am going to define a variable that contains the list of hashs that we are going to use for creating the Merkle tree, proving the Merkle proof, and finding the Markle root for these hashs.

const listOfHashes = [
'0x0b46cbd82d0fe12c84fd676f2f6e4df01793fba1a88b49c6563f0487b9c14285',
'0x6ef163a08119da6c77d64c32946ab80a0e6c91c34612856be8a201a3673919c2',
'0x578ee1b8ac3c48f984b15621a476a02cbf46be9ee4f0b4691c16476ce389a0d5',
'0x268a39246efd811aa6b9377910e7a13131de485641907e87fd31ca6bc36a03b1',
'0x7dff748603fdeba8b761df81c372390c4fc4a6006e1c74933b81bd54dd904073',
'0x0624dc85cea85575ed94a290b5c3b3a0caa8a1c2e888c287f0689d4e4f4d4ba5',
'0x4ff2928281aa0d7f48d025576f0bad4794eb26cadf06e177a1ca073ff451eec7',
'0xba7f0db9e9ef33eeea0ef0645738c28328fe5e95acdcb8812b96437c9fde0ef1',
'0xb840fa85ebb73022fc52965f67a987cd8be7b0e22d1e5b8e4c898e3687f360b0',
'0x10e2dae66b1eed88f323ba69066a9f5d9cf7d1ae8f67e20a39fc898a1f4aad73',
'0x717dfefbb729adb0fc1b8982dd345161b19613ece76737ba75400de41802c11c',
'0x2562ea7215f843b9d6cacb3449e9af8caf9bea6a9735791b8072263b1a883ba0',
'0xff693258f9bed50e203866937b5f6a1941eaea4633092ee539f3796cde1cd823',
'0x6e657fc4835f14a0f3dee237c4fb3d91f1a7105b752f66f573cfd165e32247e5',
'0x555d04168bbfda587aeda61805b3dd4dd70aa62613a8671dc871109b66081e19'
];

Define a function that helps us to find the sha256 hash of the data. We can achieve this by using a crypto module that we imported earlier. If we try to do it on our own then it is a very complicated and time-consuming process because we need to implement the sha256 algorithm on our own.

const sha256 = data => {
    return crypto.createHash('sha256').update(data).digest().toString('hex');
};

Define a function that finds the direction of the hash(left & right) that's why we already define two variables Left & Right. How can we get this direction? simple if the hash is on the even index that means it's on the left side or if the hash is on the odd index that means it's on the right side. If we get the index of the hash then we can find the modulo of that index with 2, if the modulo is equal to zero then it's Left otherwise it's Right.

const findDirectionOfHash = (hash, tree) => {
    const hashIndex = merkleTree[0].findIndex(h => h === hash);
    return hashIndex % 2 === 0 ? LEFT : RIGHT;
};

One more problem in the Merkle tree, What if the size of the hashes is odd, meaning Suppose there are only nine hashes then we start pairing two hashes with each other but the pair for the last index is not available, What do we do? So we create a function that checks the size of the hashes, if it's even then ok otherwise copy the last hash and paste it at the end of the array, simple. We get the pair for the ninth number hash. You can see in the below image and get an idea about this.

In the first image you can see E is only one left hash and there is no hash left to which E can concatenate and complete the process same in the EE case. So to solve this problem we simply double the hash E and EE case to complete the tree process. you can see in the second image. The code to achieve this is as follows:-

function isEven(hashes) {
    if(hashes.length % 2 !== 0) {
        hashes.push(hashes[hashes.length - 1]);
    }
}

It's time to calculate the Root of this given list of hash. Are you ready guys? I hope so. We need to define a function that calculates the root of the hashes for us. Let's go.

function FindRootOfHashes(hashes){
      if(!hashes || hashes.length == 0) {
        return ''; // If this condition match then it's return the empty string.
    }
    isEven(hashes);
    const listOfTheHashes = [];
    for(let i = 0; i < hashes.length; i += 2) {
    // Concatenate the two hash with each other
        const hashPairConcatenated = hashes[i] + hashes[i + 1];
    //Then after concatenation we finding the hash of the concatenation pair.
        const hash = sha256(hashPairConcatenated);
    //Then we pushing this hash to the combinedHashes array.
        listOfTheHashes.push(hash);
    }
    // If the listOfTheHashes length is 1, it means that we have the merkle root already
    // and we can return
    if(listOfTheHashes.length === 1) {
        return listOfTheHashes.join('');
    }
    return FindRootOfHashes(listOfTheHashes);
}

Here you need to understand the flow of the code. Let me explain to you. We have the array that contains the list of hashes that we defined already. This function takes that list of hashes and performs a task that we write inside the function. First, it's running the 'if' condition, that the length of the array is equal to zero or not. If yes then it returns the empty string otherwise it moves to the next line of code. In the next line of code, it calls another function "isEven()" to check whether the length of the array is Even or Odd. If the length of the array is odd then it doubles the last index hash pushes it inside the array and after that, comes back to the next line of code. In the next line, we create a new variable that contains the empty dynamic array. Why do we need to create this variable? Because inside this variable we are going to push the hashes that we get after finding the hash of each pair using sha256. In the next line, we are running the for loop that starts iterating from 0 to hashes.length-1 and increasing the value of I by two because finding the sha256 hash for two pairs, This is because we do that. After concatenating and finding the hash throw the sha256 algorithm we push that hash inside the 'listOfHashes' array that we defined earlier. When the loop is complete then we come out from the loop and move to the next line of code. In the next line, we are comparing the length of the 'list of hashes' because if the length is equal to one that means we get our root of hashes, if the length is not equal to one then we move to the next line. Inside that line, we are calling the FindRootOfHashes one more time and at that time we are sending the listOfHashes array, not the hashes array. We need to perform the same steps to get the one hash that is our root and perform the same steps again and again until we get our one hash inside the 'list of hashes'. If we get the length of the list of hashes equal to one then we simply end the function and return the final hash that is our root. So this is how we get our root hash I hope you understand the flow of the code.

It's time to see what is our outcome after running this function.

const rootHash = FindRootOfHashes(listOfHashes);
console.log('The root of this list of hashes:',rootHash);

The root of this list of hashes: d9e373ca7462868bc64ecefe14a2c13c954158981773f901702cd3351f2ba57a

This is our outcome, please check it one time from your side also.

Finding the Merkle root part is completed here, now let's move to the Merkle tree implementation part.

function findTreeOfHashes(hashes) {
    if(!hashes || hashes.length === 0) {
        return [];
    }
    const tree = [hashes];
    const createTree = (hashes, tree) => {
        if(hashes.length === 1) {
            return hashes;
        }
        isEven(hashes);
        const listOfTheHashes = [];
        for(let i = 0; i < hashes.length; i += 2) {
            const hashesConcatenated = hashes[i] + hashes[i + 1];
            const hash = sha256(hashesConcatenated);
            listOfTheHashes.push(hash);
        }
        tree.push(listOfTheHashes);
        return createTree(listOfTheHashes, tree);
    }
    createTree(hashes, tree);
    return tree;
}

I know you are telling yourself that, What is this? I can understand your feelings but guys if you understand the flow of this function then this code becomes a simple thing for you let's begin. At the start of the function the same if condition code that checks the length of the given list of hashes same as the previous root function, I think there is no problem with this line of code. In the next line, we did the same thing that we did previously like we are creating a variable that contains the list of hashes inside it. We created this variable because inside this array we are going to hold our whole tree, You can understand it as a 2D array means an array inside an array. In the next line, we create another function named "createTree" Inside the parameter we are passing the list of hashes as well as the tree. Now the next code till for loop is the same as the previous function, you can read from there because if I am going to repeat here then the size of this article becomes too long. In the next line, we push the array inside our tree variable and call the function one again until the size of the list of hashes is not equal to one. Outside this function we are calling the function for the first time I hope you have some miner knowledge about node js. In the end, we are simply returning our whole tree and inside that tree, we have our whole Merkle tree. Let's call this function and check the outcome of this.

const treeHashes = findTreeOfHashes(listOfHashes);
console.log('this is our Merkle tree:', treeHashes);

This is our outcome and it is too long.

This is our Merkle tree: [ 

[ '0x0b46cbd82d0fe12c84fd676f2f6e4df01793fba1a88b49c6563f0487b9c14285',
 '0x6ef163a08119da6c77d64c32946ab80a0e6c91c34612856be8a201a3673919c2',
 '0x578ee1b8ac3c48f984b15621a476a02cbf46be9ee4f0b4691c16476ce389a0d5',
 '0x268a39246efd811aa6b9377910e7a13131de485641907e87fd31ca6bc36a03b1',
 '0x7dff748603fdeba8b761df81c372390c4fc4a6006e1c74933b81bd54dd904073',
 '0x0624dc85cea85575ed94a290b5c3b3a0caa8a1c2e888c287f0689d4e4f4d4ba5',
 '0x4ff2928281aa0d7f48d025576f0bad4794eb26cadf06e177a1ca073ff451eec7',
 '0xba7f0db9e9ef33eeea0ef0645738c28328fe5e95acdcb8812b96437c9fde0ef1',
 '0xb840fa85ebb73022fc52965f67a987cd8be7b0e22d1e5b8e4c898e3687f360b0',
 '0x10e2dae66b1eed88f323ba69066a9f5d9cf7d1ae8f67e20a39fc898a1f4aad73',
 '0x717dfefbb729adb0fc1b8982dd345161b19613ece76737ba75400de41802c11c',
 '0x2562ea7215f843b9d6cacb3449e9af8caf9bea6a9735791b8072263b1a883ba0',
 '0xff693258f9bed50e203866937b5f6a1941eaea4633092ee539f3796cde1cd823',
 '0x6e657fc4835f14a0f3dee237c4fb3d91f1a7105b752f66f573cfd165e32247e5',
 '0x555d04168bbfda587aeda61805b3dd4dd70aa62613a8671dc871109b66081e19',
 '0x555d04168bbfda587aeda61805b3dd4dd70aa62613a8671dc871109b66081e19' ],

[ '51fe46cbc3cae464b94d7f51fe5854520e5a50b81d582abe27d025fc610cf2a3',
 '8dcbdd3d6474719dc8e1faf4a922a84783560442bb6245fce705605747b9d5ff',
 'c4b210cec6adbf2a9ac9aaf3473e1231cf3bc830ffcd3236b8727ac6800e1f69',
 'c623904a923b8d31ff1fd80311553dc35978d64d7b55d9095f1bc57e788e36bf',
 '58f0b429e67192eba97a43650e0ffd9d73a3306e26ab3843d9f77962bc783a08',
 'be30349d91afdd8dd2414e8c19cf6d34fc4887fa3aeb658a06e77a709fd1ff85',
 '833e8957094c9010f084342ba7d96163fe2e1d09075ceb1bb562b6b39270fc15',
 'c57509171c1bc365eb921131a206430dcb5f14f0882067ded6f25ee3b761f3ca' ],

 [ 'fdaeb371ce597102fea90f6a2d3044132fd87566b48eccbf5dcb48c9a5994214',
 'e72bc2faf42326f2c599eeb8a644df144aee84b8f35a9503b0c03a4d5b00f8e0',
 '8cc28b64770f06d4ba5c191309958486bff886ac34f9c30cfb80de72afa8cb91',
 '11873e6aa3024d9c4f1237a8dab59c68f8fc890d65643c8b69e2bf47e353a126' ],

 [ 'ed7be99a093f6336ce28511e1dca2968559da8ee2993cd42e06e934272829c5d',
 '1fa1425ab0b2d77add1942ecfb7c0e8bd98870f6d799d001a959066bb410fd13' ], 

[ 'd9e373ca7462868bc64ecefe14a2c13c954158981773f901702cd3351f2ba57a' ] ] //root

We finally completed our two main functions. Let's move to the next function which is our proof of whether the hash belongs to this tree or not. If you still read this article, thank you. I explain each line in the comments so don't forget to read it.

function findTheMerkleProof(hash, hashes) {
// This candition is checking the size of the hashes array as well as checking
// that hash and hashes is given or not. If the candition matches then it will
//return null.
    if(!hash || !hashes || hashes.length === 0) {
        return null;
    }
// Here we are creating a variable and call our findTreeOfHashes function with
//the given hashes array. The tree variable holds the entire Markle tree.
    const tree = findTreeOfHashes(hashes);
// This variable hold the hash as well as the direction of the hash. It is a type
// of array that contains object inside it.
    const merkleProof = [{
        hash,
// Here we are call the findDirectionOFHash function that we define earlier.
        direction: findDirectionOfHash(hash, tree)
    }];
// Here we finding the index of the hash because it is importand to find the 
// patner of this hash.
    let hashIndex = tree[0].findIndex(h => h === hash);
// for loop for finding the sibling direction as well as the index and after 
// finding this things we are creating siblingNode name variable that is object
// type and hold the hash and the direction of sibling.
    for(let level = 0; level < tree.length - 1; level++) {
        const isLeftChild = hashIndex % 2 === 0;
        const siblingDirection = isLeftChild ? RIGHT : LEFT;
        const siblingIndex = isLeftChild ? hashIndex + 1 : hashIndex - 1;
        const siblingNode = {
            hash: tree[level][siblingIndex],
            direction: siblingDirection
        };
// Now we are pushing the siblingNode values inside the merkleProof array that
// we defind above the hashIndex.
        merkleProof.push(siblingNode);
// Here we are trying to find the index of this hash parent like if the recent 
// hash index is 4 if we divide it with 2 then we get 2. So, the parent hash index
// is 2 simple. We are using Math.floor because if we have odd number index like 5
// then we try to divide it with two then the outcome is 2.5 but ther is no index like
// this in our array so Math.floor is round up the value 2.5 to 2.
        hashIndex = Math.floor(hashIndex / 2);
    }
// In the end we are simply returning the whole merkleProof array.
    return merkleProof;
}

Let's call this function in our console.

// listOfHashes[4] means find proof for index number 4 hash.
const findMerkleProof = findTheMerkleProof(listOfHashes[4], listOfHashes);
console.log('findMerkleProof: ', findMerkleProof);

This is our outcome

findMerkleProof:  [
  {
    hash: '0x7dff748603fdeba8b761df81c372390c4fc4a6006e1c74933b81bd54dd904073',
    direction: 'left'
  },
  {
    hash: '0x0624dc85cea85575ed94a290b5c3b3a0caa8a1c2e888c287f0689d4e4f4d4ba5',
    direction: 'right'
  },
  {
    hash: 'c623904a923b8d31ff1fd80311553dc35978d64d7b55d9095f1bc57e788e36bf',
    direction: 'right'
  },
  {
    hash: 'fdaeb371ce597102fea90f6a2d3044132fd87566b48eccbf5dcb48c9a5994214',
    direction: 'left'
  },
  {
    hash: '1fa1425ab0b2d77add1942ecfb7c0e8bd98870f6d799d001a959066bb410fd13',
    direction: 'right'
  }
]

That's all for today guys I know this is too long but I try to explain everything easily that's why it is too long. Thank you for reading till the end see you soon. Keep reading, Keep learning, Keep growing.