Data structures and encoding (Deep Dive)

The Electroneum Smart Chain creates, stores and transfers large volumes of data. This data must get formatted in standardised and memory-efficient ways to allow anyone to run a node on relatively modest consumer-grade hardware. To achieve this, several specific data structures are used on the Electroneum Smart Chain stack.

Prerequisites
You should understand the fundamentals of Electroneum and client software. Familiarity with the networking layer and the Ethereum whitepaper is recommended.

Data structures
Patricia merkle tries
Patricia Merkle Tries are structures that encode key-value pairs into a deterministic and cryptographically authenticated trie. These are used extensively across the Electroneum Smart Chain.

More on Patricia Merkle Tries below

Recursive Length Prefix
Recursive Length Prefix (RLP) is a serialisation method used extensively across the Electroneum Smart Chain.

More on RLP below

Patricia Merkle Trie

A Merkle Patricia Trie provides a cryptographically authenticated data structure that can be used to store all (key, value) bindings.

Merkle Patricia Tries are fully deterministic, meaning that tries with the same (key, value) bindings are guaranteed to be identical—down to the last byte. This means that they have the same root hash, providing the holy grail of O(log(n)) efficiency for inserts, lookups and deletes. Moreover, they are simpler to understand and code than more complex comparison-based alternatives, like red-black trees.

Prerequisites

To better understand this page, it would be helpful to have basic knowledge of hashes↗, Merkle trees↗, tries↗ and serialisation↗.

Basic radix tries

In a basic radix trie, every node looks as follows:

Copy

[i_0, i_1 ... i_n, value]

Where i_0 ... i_n represent the symbols of the alphabet (often binary or hex), value is the terminal value at the node, and the values in the i_0, i_1 ... i_n slots are either NULL or pointers to (in our case, hashes of) other nodes. This forms a basic (key, value) store.

Say you wanted to use a radix tree data structure for persisting an order over a set of key value pairs. To find the value currently mapped to the key dog in the trie, you would first convert dog into letters of the alphabet (giving 64 6f 67), and then descend the trie following that path until you find the value. That is, you start by looking up the root hash in a flat key/value DB to find the root node of the trie. It is represented as an array of keys pointing to other nodes. You would use the value at index 6 as a key and look it up in the flat key/value DB to get the node one level down. Then pick index 4 to look up the next value, then pick index 6, and so on, until, once you followed the path: root -> 6 -> 4 -> 6 -> 15 -> 6 -> 7, you would look up the value of the node and return the result.

There is a difference between looking something up in the ‘trie’ and the underlying flat key/value ‘DB’. They both define key/value arrangements, but the underlying DB can do a traditional 1 step lookup of a key. Looking up a key in the trie requires multiple underlying DB lookups to get to the final value described above. Let’s refer to the latter as a path to eliminate ambiguity.

The update and delete operations for radix tries can be defined as follows:

Copy

    def update(node,path,value):
        if path == '':
            curnode = db.get(node) if node else [ NULL ] * 17
            newnode = curnode.copy()
            newnode[-1] = value
        else:
            curnode = db.get(node) if node else [ NULL ] * 17
            newnode = curnode.copy()
            newindex = update(curnode[path[0]],path[1:],value)
            newnode[path[0]] = newindex
        db.put(hash(newnode),newnode)
        return hash(newnode)

    def delete(node,path):
        if node is NULL:
            return NULL
        else:
            curnode = db.get(node)
            newnode = curnode.copy()
            if path == '':
                newnode[-1] = NULL
            else:
                newindex = delete(curnode[path[0]],path[1:])
                newnode[path[0]] = newindex

            if all(x is NULL for x in newnode):
                return NULL
            else:
                db.put(hash(newnode),newnode)
                return hash(newnode)
                

A “Merkle” Radix tree is built by linking nodes using deterministically-generated cryptographic hash digests. This content-addressing (in the key/value DB key == keccak256(rlp(value))) provides a cryptographic integrity guarantee of the stored data. If the root hash of a given trie is publicly known, then anyone with access to the underlying leaf data can construct a proof that the trie includes a given value at a specific path by providing the hashes of each node joining a specific value to the tree root.

It is impossible for an attacker to provide a proof of a (path, value) pair that does not exist since the root hash is ultimately based on all hashes below it. Any underlying modification would change the root hash. You can think of the hash as a compressed representation of structural information about the data, secured by the pre-image protection of the hashing function.

We’ll refer to an atomic unit of a radix tree (e.g. a single hex character, or 4 bit binary number) as a “nibble”. While traversing a path one nibble at a time, as described above, nodes can maximally refer to 16 children but include a value element. We, hence, represent them as an array of length 17. We call these 17-element arrays “branch nodes”.

Merkle Patricia Trie

Radix tries have one major limitation: they are inefficient. If you want to store one (path, value) binding where the path, like in Electroneum, is 64 characters long (the number of nibbles in bytes32), we will need over a kilobyte of extra space to store one level per character, and each lookup or delete will take the full 64 steps. The Patricia trie introduced in the following solves this issue.

Optimization

A node in a Merkle Patricia trie is one of the following:

  1. NULL (represented as the empty string)

  2. branch A 17-item node [ v0 ... v15, vt ]

  3. leaf A 2-item node [ encodedPath, value ]

  4. extension A 2-item node [ encodedPath, key ]

With 64 character paths it is inevitable that after traversing the first few layers of the trie, you will reach a node where no divergent path exists for at least part of the way down. To avoid having to create up to 15 sparse NULL nodes along the path, we shortcut the descent by setting up an extension node of the form [ encodedPath, key ], where encodedPath contains the “partial path” to skip ahead (using a compact encoding described below), and the key is for the next DB lookup.

For a leaf node, which can be marked by a flag in the first nibble of the encodedPath, the path encodes all prior node’s path fragments and we can look up the value directly.

This above optimization, however, introduces ambiguity.

When traversing paths in nibbles, we may end up with an odd number of nibbles to traverse, but because all data is stored in bytes format. It is not possible to differentiate between, for instance, the nibble 1, and the nibbles 01 (both must be stored as <01>). To specify odd length, the partial path is prefixed with a flag.

Specification: Compact encoding of hex sequence with optional terminator

The flagging of both odd vs. even remaining partial path length and leaf vs. extension node as described above reside in the first nibble of the partial path of any 2-item node. They result in the following:

Copy

hex char    bits    |    node type partial     path length
----------------------------------------------------------
   0        0000    |       extension              even
   1        0001    |       extension              odd
   2        0010    |   terminating (leaf)         even
   3        0011    |   terminating (leaf)         odd
   

For even remaining path length (0 or 2), another 0 “padding” nibble will always follow.

Copy

    def compact_encode(hexarray):
        term = 1 if hexarray[-1] == 16 else 0
        if term: hexarray = hexarray[:-1]
        oddlen = len(hexarray) % 2
        flags = 2 * term + oddlen
        if oddlen:
            hexarray = [flags] + hexarray
        else:
            hexarray = [flags] + [0] + hexarray
        // hexarray now has an even length whose first nibble is the flags.
        o = ''
        for i in range(0,len(hexarray),2):
            o += chr(16 * hexarray[i] + hexarray[i+1])
        return o

Examples:

Copy

    > [ 1, 2, 3, 4, 5, ...]
    '11 23 45'
    > [ 0, 1, 2, 3, 4, 5, ...]
    '00 01 23 45'
    > [ 0, f, 1, c, b, 8, 10]
    '20 0f 1c b8'
    > [ f, 1, c, b, 8, 10]
    '3f 1c b8'
    

Here is the extended code for getting a node in the Merkle Patricia trie:

Copy

    def get_helper(node,path):
        if path == []: return node
        if node = '': return ''
        curnode = rlp.decode(node if len(node) < 32 else db.get(node))
        if len(curnode) == 2:
            (k2, v2) = curnode
            k2 = compact_decode(k2)
            if k2 == path[:len(k2)]:
                return get(v2, path[len(k2):])
            else:
                return ''
        elif len(curnode) == 17:
            return get_helper(curnode[path[0]],path[1:])

    def get(node,path):
        path2 = []
        for i in range(len(path)):
            path2.push(int(ord(path[i]) / 16))
            path2.push(ord(path[i]) % 16)
        path2.push(16)
        return get_helper(node,path2)

Example Trie

Suppose we want a trie containing four path/value pairs ('do', 'verb'), ('dog', 'puppy'), ('doge', 'coin'), ('horse', 'stallion').

First, we convert both paths and values to bytes. Below, actual byte representations for paths are denoted by <>, although values are still shown as strings, denoted by '', for easier comprehension (they, too, would actually be bytes):

Copy

    <64 6f> : 'verb'
    <64 6f 67> : 'puppy'
    <64 6f 67 65> : 'coin'
    <68 6f 72 73 65> : 'stallion'

Now, we build such a trie with the following key/value pairs in the underlying DB:

Copy

    rootHash: [ <16>, hashA ]
    hashA:    [ <>, <>, <>, <>, hashB, <>, <>, <>, [ <20 6f 72 73 65>, 'stallion' ], <>, <>, <>, <>, <>, <>, <>, <> ]
    hashB:    [ <00 6f>, hashD ]
    hashD:    [ <>, <>, <>, <>, <>, <>, hashE, <>, <>, <>, <>, <>, <>, <>, <>, <>, 'verb' ]
    hashE:    [ <17>, [ <>, <>, <>, <>, <>, <>, [ <35>, 'coin' ], <>, <>, <>, <>, <>, <>, <>, <>, <>, 'puppy' ] ]

When one node is referenced inside another node, what is included is H(rlp.encode(x)), where H(x) = keccak256(x) if len(x) >= 32 else x and rlp.encode is the RLP encoding function.

Note that when updating a trie, one needs to store the key/value pair (keccak256(x), x) in a persistent lookup table if the newly-created node has length >= 32. However, if the node is shorter than that, one does not need to store anything, since the function f(x) = x is reversible.

Tries in Electroneum Smart Chain

All of the merkle tries in Electroneum Smart Chain use a Merkle Patricia Trie.

From a block header there are 3 roots from 3 of these tries.

  1. stateRoot

  2. transactionsRoot

  3. receiptsRoot

State Trie

There is one global state trie, and it is updated every time a client processes a block. In it, a path is always: keccak256(electroneumAddress) and a value is always: rlp(electroneumAccount). More specifically an electroneum account is a 4 item array of [nonce,balance,storageRoot,codeHash]. At this point, it’s worth noting that this storageRoot is the root of another patricia trie:

Storage Trie

Storage trie is where all contract data lives. There is a separate storage trie for each account. To retrieve values at specific storage positions at a given address the storage address, integer position of the stored data in the storage, and the block ID are required. These can then be passed as arguments to the eth_getStorageAt defined in the JSON-RPC API, e.g. to retrieve the data in storage slot 0 for address 0x295a70b2de5e3953354a6a8344e616ed314d7251:

Copy

curl -X POST --data '{"jsonrpc":"2.0", "method": "eth_getStorageAt", "params": ["0x295a70b2de5e3953354a6a8344e616ed314d7251", "0x0", "latest"], "id": 1}' localhost:8545

{"jsonrpc":"2.0","id":1,"result":"0x00000000000000000000000000000000000000000000000000000000000004d2"}

Retrieving other elements in storage is slightly more involved because the position in the storage trie must first be calculated. The position is calculated as the keccak256 hash of the address and the storage position, both left-padded with zeros to a length of 32 bytes. For example, the position for the data in storage slot 1 for address 0x391694e7e0b0cce554cb130d723a9d27458f9298 is:

Copy

keccak256(decodeHex("000000000000000000000000391694e7e0b0cce554cb130d723a9d27458f9298" + "0000000000000000000000000000000000000000000000000000000000000001"))

In a Etn-sc console, this can be calculated as follows:

Copy

> var key = "000000000000000000000000391694e7e0b0cce554cb130d723a9d27458f9298" + "0000000000000000000000000000000000000000000000000000000000000001"
undefined
> web3.sha3(key, {"encoding": "hex"})
"0x6661e9d6d8b923d5bbaab1b96e1dd51ff6ea2a93520fdc9eb75d059238b8c5e9"

The path is therefore keccak256(<6661e9d6d8b923d5bbaab1b96e1dd51ff6ea2a93520fdc9eb75d059238b8c5e9>). This can now be used to retrieve the data from the storage trie as before:

Copy

curl -X POST --data '{"jsonrpc":"2.0", "method": "eth_getStorageAt", "params": ["0x295a70b2de5e3953354a6a8344e616ed314d7251", "0x6661e9d6d8b923d5bbaab1b96e1dd51ff6ea2a93520fdc9eb75d059238b8c5e9", "latest"], "id": 1}' localhost:8545

{"jsonrpc":"2.0","id":1,"result":"0x000000000000000000000000000000000000000000000000000000000000162e"}

Note: The storageRoot for an Electroneum account is empty by default if it’s not a contract account.

Transactions Trie

There is a separate transactions trie for every block, again storing (key, value) pairs. A path here is: rlp(transactionIndex) which represents the key that corresponds to a value determined by:

Copy

if legacyTx:
  value = rlp(tx)
else:
  value = TxType | encode(tx)

More information on this can be found in the EIP 2718↗ documentation.

Receipts Trie

Every block has its own Receipts trie. A path here is: rlp(transactionIndex). transactionIndex is its index within the block it’s mined. The receipts trie is never updated. Similar to the Transactions trie, there are current and legacy receipts. To query a specific receipt in the Receipts trie, the index of the transaction in its block, the receipt payload and the transaction type are required. The Returned receipt can be of type Receipt which is defined as the concentenation of TransactionType and ReceiptPayload or it can be of type LegacyReceipt which is defined as rlp([status, cumulativeGasUsed, logsBloom, logs]).

More information on this can be found in the EIP 2718↗ documentation.

Further reading

Recursive-length prefix (RLP)

Recursive Length Prefix (RLP) serialisation is used extensively in the Electroneum Smart Chain. RLP standardises the transfer of data between nodes in a space-efficient format. The purpose of RLP is to encode arbitrarily nested arrays of binary data, and RLP is the primary encoding method used to serialise objects in the Electroneum Smart Chain. The only purpose of RLP is to encode structure; encoding specific data types (e.g. strings, floats) is left up to higher-order protocols; but positive RLP integers must be represented in big-endian binary form with no leading zeroes (thus making the integer value zero equivalent to the empty byte array). Deserialized positive integers with leading zeroes get treated as invalid. The integer representation of string length must also be encoded this way, as well as integers in the payload.

More information in the Ethereum yellow paper (Appendix B):arrow_upper_right:.

To use RLP to encode a dictionary, the two suggested canonical forms are:

  • use [[k1,v1],[k2,v2]...] with keys in lexicographic order

  • use the higher-level Patricia Tree encoding as Electroneum does

Definition

The RLP encoding function takes in an item. An item is defined as follows:

  • a string (i.e. byte array) is an item

  • a list of items is an item

For example, all of the following are items:

  • an empty string;

  • the string containing the word “cat”;

  • a list containing any number of strings;

  • and a more complex data structures like ["cat", ["puppy", "cow"], "horse", [[]], "pig", [""], "sheep"].

Note that in the context of the rest of this page, ‘string’ means “a certain number of bytes of binary data”; no special encodings are used, and no knowledge about the content of the strings is implied.

RLP encoding is defined as follows:

  • For a single byte whose value is in the [0x00, 0x7f] (decimal [0, 127]) range, that byte is its own RLP encoding.

  • Otherwise, if a string is 0-55 bytes long, the RLP encoding consists of a single byte with value 0x80 (dec. 128) plus the length of the string followed by the string. The range of the first byte is thus [0x80, 0xb7] (dec. [128, 183]).

  • If a string is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xb7 (dec. 183) plus the length in bytes of the length of the string in binary form, followed by the length of the string, followed by the string. For example, a 1024 byte long string would be encoded as \xb9\x04\x00 (dec. 185, 4, 0) followed by the string. Here, 0xb9 (183 + 2 = 185) as the first byte, followed by the 2 bytes 0x0400 (dec. 1024) that denote the length of the actual string. The range of the first byte is thus [0xb8, 0xbf] (dec. [184, 191]).

  • If the total payload of a list (i.e. the combined length of all its items being RLP encoded) is 0-55 bytes long, the RLP encoding consists of a single byte with value 0xc0 plus the length of the list followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xc0, 0xf7] (dec. [192, 247]).

  • If the total payload of a list is more than 55 bytes long, the RLP encoding consists of a single byte with value 0xf7 plus the length in bytes of the length of the payload in binary form, followed by the length of the payload, followed by the concatenation of the RLP encodings of the items. The range of the first byte is thus [0xf8, 0xff] (dec. [248, 255]).

In code, this is:

Copy

def rlp_encode(input):
    if isinstance(input,str):
        if len(input) == 1 and ord(input) < 0x80:
            return input
        return encode_length(len(input), 0x80) + input
    elif isinstance(input, list):
        output = ''
        for item in input:
            output += rlp_encode(item)
        return encode_length(len(output), 0xc0) + output

def encode_length(L, offset):
    if L < 56:
         return chr(L + offset)
    elif L < 256**8:
         BL = to_binary(L)
         return chr(len(BL) + offset + 55) + BL
     raise Exception("input too long")

def to_binary(x):
    if x == 0:
        return ''
    return to_binary(int(x / 256)) + chr(x % 256)

Examples

  • the string “dog” = [ 0x83, ‘d’, ‘o’, ‘g’ ]

  • the list [ “cat”, “dog” ] = [ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ]

  • the empty string (‘null’) = [ 0x80 ]

  • the empty list = [ 0xc0 ]

  • the integer 0 = [ 0x80 ]

  • the encoded integer 0 (‘\x00’) = [ 0x00 ]

  • the encoded integer 15 (‘\x0f’) = [ 0x0f ]

  • the encoded integer 1024 (‘\x04\x00’) = [ 0x82, 0x04, 0x00 ]

  • the set theoretical representation↗ of three, [ [], [[]], [ [], [[]] ] ] = [ 0xc7, 0xc0, 0xc1, 0xc0, 0xc3, 0xc0, 0xc1, 0xc0 ]

  • the string “Lorem ipsum dolor sit amet, consectetur adipisicing elit” = [ 0xb8, 0x38, 'L', 'o', 'r', 'e', 'm', ' ', ... , 'e', 'l', 'i', 't' ]

RLP decoding

According to the rules and process of RLP encoding, the input of RLP decode is regarded as an array of binary data. The RLP decoding process is as follows:

  1. according to the first byte (i.e. prefix) of input data and decoding the data type, the length of the actual data and offset;

  2. according to the type and offset of data, decode the data correspondingly;

  3. continue to decode the rest of the input;

Among them, the rules of decoding data types and offset is as follows:

  1. the data is a string if the range of the first byte (i.e. prefix) is [0x00, 0x7f], and the string is the first byte itself exactly;

  2. the data is a string if the range of the first byte is [0x80, 0xb7], and the string whose length is equal to the first byte minus 0x80 follows the first byte;

  3. the data is a string if the range of the first byte is [0xb8, 0xbf], and the length of the string whose length in bytes is equal to the first byte minus 0xb7 follows the first byte, and the string follows the length of the string;

  4. the data is a list if the range of the first byte is [0xc0, 0xf7], and the concatenation of the RLP encodings of all items of the list which the total payload is equal to the first byte minus 0xc0 follows the first byte;

  5. the data is a list if the range of the first byte is [0xf8, 0xff], and the total payload of the list whose length is equal to the first byte minus 0xf7 follows the first byte, and the concatenation of the RLP encodings of all items of the list follows the total payload of the list;

In code, this is:

Copy

def rlp_decode(input):
    if len(input) == 0:
        return
    output = ''
    (offset, dataLen, type) = decode_length(input)
    if type is str:
        output = instantiate_str(substr(input, offset, dataLen))
    elif type is list:
        output = instantiate_list(substr(input, offset, dataLen))
    output += rlp_decode(substr(input, offset + dataLen))
    return output

def decode_length(input):
    length = len(input)
    if length == 0:
        raise Exception("input is null")
    prefix = ord(input[0])
    if prefix <= 0x7f:
        return (0, 1, str)
    elif prefix <= 0xb7 and length > prefix - 0x80:
        strLen = prefix - 0x80
        return (1, strLen, str)
    elif prefix <= 0xbf and length > prefix - 0xb7 and length > prefix - 0xb7 + to_integer(substr(input, 1, prefix - 0xb7)):
        lenOfStrLen = prefix - 0xb7
        strLen = to_integer(substr(input, 1, lenOfStrLen))
        return (1 + lenOfStrLen, strLen, str)
    elif prefix <= 0xf7 and length > prefix - 0xc0:
        listLen = prefix - 0xc0;
        return (1, listLen, list)
    elif prefix <= 0xff and length > prefix - 0xf7 and length > prefix - 0xf7 + to_integer(substr(input, 1, prefix - 0xf7)):
        lenOfListLen = prefix - 0xf7
        listLen = to_integer(substr(input, 1, lenOfListLen))
        return (1 + lenOfListLen, listLen, list)
    raise Exception("input does not conform to RLP encoding form")

def to_integer(b):
    length = len(b)
    if length == 0:
        raise Exception("input is null")
    elif length == 1:
        return ord(b[0])
    return ord(substr(b, -1)) + to_integer(substr(b, 0, -1)) * 256

Further reading

Web3 secret storage definition

To make your app work on the Electroneum Smart Chain, you can use the web3 object provided by the web3.js library. Under the hood it communicates to a local node through RPC calls. web3↗ works with any Electroneum Smart Chain node which exposes an RPC layer.

web3 contains the eth object - web3.eth.

Copy

var fs = require("fs")
var recognizer = require("ethereum-keyfile-recognizer")

fs.readFile("keyfile.json", (err, data) => {
  var json = JSON.parse(data)
  var result = recognizer(json)
})

/** result
 *               [ 'web3', 3 ]   web3 (v3) keyfile
 *  [ 'ethersale', undefined ]   Ethersale keyfile
 *                        null     invalid keyfile
 */

This documents version 3 of the Web3 Secret Storage Definition.

Definition

The actual encoding and decoding of the file remains largely unchanged from version 1, except that the crypto algorithm is no longer fixed to AES-128-CBC (AES-128-CTR is now the minimal requirement). Most of the meanings/algorithm are similar to version 1, except mac, which is given as the SHA3 (keccak-256) of the concatenations of the second-leftmost 16 bytes of the derived key together with the full ciphertext.

Secret key files are stored directly in ~/.web3/keystore (for Unix-like systems) and ~/AppData/Web3/keystore (for Windows). They may be named anything, but a good convention is <uuid>.json, where <uuid> is the 128-bit UUID given to the secret key (a privacy-preserving proxy for the secret key’s address).

All such files have an associated password. To derive a given .json file’s secret key, first derive the file’s encryption key; this is done through taking the file’s password and passing it through a key derivation function as described by the kdf key. KDF-dependent static and dynamic parameters to the KDF function are described in kdfparams key.

PBKDF2 must be supported by all minimally-compliant implementations, denoted though:

  • kdf: pbkdf2

For PBKDF2, the kdfparams include:

  • prf: Must be hmac-sha256 (may be extended in the future);

  • c: number of iterations;

  • salt: salt passed to PBKDF;

  • dklen: length for the derived key. Must be >= 32.

Once the file’s key has been derived, it should be verified through the derivation of the MAC. The MAC should be calculated as the SHA3 (keccak-256) hash of the byte array formed as the concatenations of the second-leftmost 16 bytes of the derived key with the ciphertext key’s contents, i.e.:

Copy

KECCAK(DK[16..31] ++ <ciphertext>)

(where ++ is the concatenation operator)

This value should be compared to the contents of the mac key; if they are different, an alternative password should be requested (or the operation cancelled).

After the file’s key has been verified, the cipher text (the ciphertext key in the file) may be decrypted using the symmetric encryption algorithm specified by the cipher key and parameterised through the cipherparams key. If the derived key size and the algorithm’s key size are mismatched, the zero padded, rightmost bytes of the derived key should be used as the key to the algorithm.

All minimally-compliant implementations must support the AES-128-CTR algorithm, denoted through:

  • cipher: aes-128-ctr

This cipher takes the following parameters, given as keys to the cipherparams key:

  • iv: 128-bit initialisation vector for the cipher.

The key for the cipher is the leftmost 16 bytes of the derived key, i.e. DK[0..15]

The creation/encryption of a secret key should be essentially the reverse of these instructions. Make sure the uuid, salt and iv are actually random.

In addition to the version field, which should act as a “hard” identifier of version, implementations may also use minorversion to track smaller, non-breaking changes to the format.

Test vectors

Details:

  • Address: 008aeeda4d805471df9b2a5b0f38a0c3bcba786b

  • ICAP: XE542A5PZHH8PYIZUBEJEO0MFWRAPPIL67

  • UUID: 3198bc9c-6672-5ab3-d9954942343ae5b6

  • Password: testpassword

  • Secret: 7a28b5ba57c53603b0b07b56bba752f7784bf506fa95edc395f5cf6c7514fe9d

PBKDF2-SHA-256

Test vector using AES-128-CTR and PBKDF2-SHA-256:

File contents of ~/.web3/keystore/3198bc9c-6672-5ab3-d9954942343ae5b6.json:

Copy

{
  "crypto": {
    "cipher": "aes-128-ctr",
    "cipherparams": {
      "iv": "6087dab2f9fdbbfaddc31a909735c1e6"
    },
    "ciphertext": "5318b4d5bcd28de64ee5559e671353e16f075ecae9f99c7a79a38af5f869aa46",
    "kdf": "pbkdf2",
    "kdfparams": {
      "c": 262144,
      "dklen": 32,
      "prf": "hmac-sha256",
      "salt": "ae3cd4e7013836a3df6bd7241b12db061dbe2c6785853cce422d148a624ce0bd"
    },
    "mac": "517ead924a9d0dc3124507e3393d175ce3ff7c1e96529c6c555ce9e51205e9b2"
  },
  "id": "3198bc9c-6672-5ab3-d995-4942343ae5b6",
  "version": 3
}

Intermediates:

Derived key: f06d69cdc7da0faffb1008270bca38f5e31891a3a773950e6d0fea48a7188551 MAC Body: e31891a3a773950e6d0fea48a71885515318b4d5bcd28de64ee5559e671353e16f075ecae9f99c7a79a38af5f869aa46 MAC: 517ead924a9d0dc3124507e3393d175ce3ff7c1e96529c6c555ce9e51205e9b2 Cipher key: f06d69cdc7da0faffb1008270bca38f5

Scrypt

Test vector using AES-128-CTR and Scrypt:

Copy

{
  "crypto": {
    "cipher": "aes-128-ctr",
    "cipherparams": {
      "iv": "83dbcc02d8ccb40e466191a123791e0e"
    },
    "ciphertext": "d172bf743a674da9cdad04534d56926ef8358534d458fffccd4e6ad2fbde479c",
    "kdf": "scrypt",
    "kdfparams": {
      "dklen": 32,
      "n": 262144,
      "p": 8,
      "r": 1,
      "salt": "ab0c7876052600dd703518d6fc3fe8984592145b591fc8fb5c6d43190334ba19"
    },
    "mac": "2103ac29920d71da29f15d75b4a16dbe95cfd7ff8faea1056c33131d846e3097"
  },
  "id": "3198bc9c-6672-5ab3-d995-4942343ae5b6",
  "version": 3
}

Intermediates:

Derived key: fac192ceb5fd772906bea3e118a69e8bbb5cc24229e20d8766fd298291bba6bd MAC Body: bb5cc24229e20d8766fd298291bba6bdd172bf743a674da9cdad04534d56926ef8358534d458fffccd4e6ad2fbde479c MAC: 2103ac29920d71da29f15d75b4a16dbe95cfd7ff8faea1056c33131d846e3097 Cipher key: fac192ceb5fd772906bea3e118a69e8b

ALTERATIONS FROM VERSION 1

This version fixes several inconsistencies with the version 1 published here↗. In brief these are:

  • Capitalisation is unjustified and inconsistent (scrypt lowercase, Kdf mixed-case, MAC uppercase).

  • Address unnecessary and compromises privacy.

  • Salt is intrinsically a parameter of the key derivation function and deserves to be associated with it, not with the crypto in general.

  • SaltLen unnecessary (just derive it from Salt).

  • The key derivation function is given, yet the crypto algorithm is hard specified.

  • Version is intrinsically numeric yet is a string (structured versioning would be possible with a string, but can be considered out of scope for a rarely changing configuration file format).

  • KDF and cipher are notionally sibling concepts yet are organised differently.

  • MAC is calculated through a whitespace agnostic piece of data(!)

Changes have been made to the format to give the following file, functionally equivalent to the example given on the previously linked page:

Copy

{
  "crypto": {
    "cipher": "aes-128-cbc",
    "ciphertext": "07533e172414bfa50e99dba4a0ce603f654ebfa1ff46277c3e0c577fdc87f6bb4e4fe16c5a94ce6ce14cfa069821ef9b",
    "cipherparams": {
      "iv": "16d67ba0ce5a339ff2f07951253e6ba8"
    },
    "kdf": "scrypt",
    "kdfparams": {
      "dklen": 32,
      "n": 262144,
      "p": 1,
      "r": 8,
      "salt": "06870e5e6a24e183a5c807bd1c43afd86d573f7db303ff4853d135cd0fd3fe91"
    },
    "mac": "8ccded24da2e99a11d48cda146f9cc8213eb423e2ea0d8427f41c3be414424dd",
    "version": 1
  },
  "id": "0498f19a-59db-4d54-ac95-33901b4f1870",
  "version": 2
}

ALTERATIONS FROM VERSION 2

Version 2 was an early C++ implementation with a number of bugs. All essentials remain unchanged from it.

PreviousRecursive-length prefix (RLP)NextIntro to design and UX