Bitcoin and Ethereum Lack Data Range Search Functionality?

김인근
7 min readApr 1, 2024

--

Ingeun Kim | ingeun92@naver.com | CURG

This article was originally written in Korean, and you can find it here !!

We have learned that a blockchain is a peer-to-peer (P2P) based distributed data storage environment, and others have also become accustomed to the notion that a blockchain is a system for storing data. This is indeed true. Fundamentally, a blockchain is a database that stores transactions sent by multiple users in structures called blocks, validates their validity, reaches consensus on them, and connects them into a chain of blocks. In this context, while a blockchain is a P2P distributed environment, it can essentially be viewed as a database.

Given this understanding, the title may seem a bit strange. The databases we are familiar with, such as MySQL, MongoDB, and Oracle, have built-in functionality to search for data within a specified range using query statements or commands. (For example, “Retrieve all data from the ‘animal’ table out of the entire dataset” or “Retrieve all values greater than 5 from the data.”) Moreover, the purpose of using a database is to retrieve and utilize the data that users want. However, blockchains, which are fundamentally databases, lack this data range search functionality. Surprisingly, even the most representative blockchains, Bitcoin and Ethereum, do not have data range search capabilities.

This article aims to explain why blockchains, specifically Bitcoin and Ethereum, lack data range search functionality and how the blockchain structure would need to change to enable this capability.

First, let’s briefly review how data range searching works in traditional databases. To perform data range searches in a database, the stored data must be mapped using data structures, enabling search functionality. This process is typically carried out by what is called the database engine. The assigned index is stored separately from the related data, and whenever the indexed column value changes or new data is added, the index is simultaneously updated. The data structures commonly used for indexing are B-Trees or B+Trees, and databases automatically set and manage the index functionality for primary keys.

When a user requests a data range search, the managed mapping data structure is used to find data that satisfies the range condition, and the corresponding values are presented to the user.

So, what kind of database do blockchains like Bitcoin and Ethereum, where transactions and blocks undergo validation and consensus processes before data is ultimately stored, use? Bitcoin and Ethereum employ key-value databases for actual data storage. The reason for using key-value databases is to achieve fast performance in blockchains and to simply store data, as blockchains were initially designed for data storage only. Specifically, they use a database called LevelDB. The first blockchain, Bitcoin, utilized LevelDB, and subsequently, other major blockchains like Ethereum have followed suit. This is not due to any particular technical feature of LevelDB but rather a convention that has been adopted since Bitcoin’s inception.

Couldn’t LevelDB support data range search functionality to enable this capability in blockchains? To conclude upfront, the current implementations of Bitcoin and Ethereum using LevelDB do not support data range searches because blockchains and their underlying databases lack a database engine for indexing and mapping data structures. In other words, LevelDB, like other lightweight key-value databases, provides APIs for simple read and write operations but does not offer SQL or search functionality.

Why didn’t the blockchain developers include this functionality? If LevelDB does not support it, couldn’t they have incorporated the functionality within the blockchain itself? While there may have been various reasons for excluding this capability from Bitcoin and Ethereum initially, it is possible that they only required a storage space for reading and writing single data points. Performance considerations could also have played a role in the decision.

Regardless of the reasons, Bitcoin and Ethereum currently lack this functionality. As a result, services like btc.com and etherscan.io use external databases with indexing and data search capabilities to provide users with data range search functionality. They achieve this by transferring data from the Bitcoin and Ethereum blockchains to their external databases whenever data is updated and then performing range searches on these databases. In the case of Etherscan, for example, it does not directly display range-searched results from within the Ethereum blockchain. Instead, it transfers data from the Ethereum blockchain to its own database, performs range searches on that database, and displays the results.

However, even though btc.com and etherscan.io support data range searches, this is not a fundamental service provided by the blockchain itself. Offering services by transferring data to external databases raises the possibility of the oracle problem occurring during data transfer, as well as the need for one or more service providers.

To address this issue, a paper presented at ACM WIMS 2019 [1] suggests modifying the internal layer structure of the blockchain and introducing a lightweight database capable of range searches within the blockchain itself.

Classification of database

The paper focuses on Ethereum and proposes an embedded database system approach rather than the external database system method employed by existing services like etherscan.io. This embedded approach aims to enable inherent data range search capabilities within the blockchain system.

System overview

The proposed system architecture introduces additional components to the existing system, including the Register Manager, Query Manager, Modified Block Layer, and a Relational Database (SQLite).

In the existing system, transactions within blocks are stored in the database through the following sequence:

  1. Transactions enter the Ethereum-based blockchain through the Application Layer.
  2. The incoming transactions pass through the Service Interface Layer and undergo transaction validity verification in the Transaction Layer. If the transaction is a smart contract, it is sent to the Ethereum Virtual Machine (EVM) through the Smart Contract Manager for execution, and the resulting transaction undergoes validity verification.
  3. Validated transactions are included in blocks at the Block Layer. These blocks undergo a consensus process before being connected to the main chain.
  4. The contents of the blocks connected to the main chain are stored in the database.

In the proposed model, transactions within blocks are stored as follows:

  1. Transactions enter the Ethereum-based blockchain through the Application Layer.
  2. The incoming transactions pass through the Service Interface Layer and are classified by the Register Manager as either smart contract transactions or regular transactions. If a transaction is a smart contract transaction, this fact is temporarily recorded as a marker.
  3. Transactions proceed to the Transaction Layer, following the same process as in the existing system (refer to step {2} in the existing system flow).
  4. Validated transactions are included in blocks at the Block Layer. These blocks undergo a consensus process before being connected to the main chain.
  5. When the contents of the blocks connected to the main chain are stored in the database, transactions marked by the Register Manager as smart contract transactions are also stored in the Relational Database (SQLite).

The process of using query statements to search for data within the proposed blockchain system is as follows:

  1. A query API transaction enters the Ethereum-based blockchain through the Application Layer.
  2. The incoming transaction passes through the Service Interface Layer and is executed by the Query Manager. Only SELECT queries are allowed, as other query types (particularly INSERT) could modify the consensus-completed blockchain data, compromising the integrity of the entire blockchain system.
  3. The Query Manager accesses the Relational Database, retrieves the results of the requested query, and returns them to the Application Layer for presentation to the user.

The following diagram illustrates this process visually:

Process overview

So, what level of performance improvement was achieved with the proposed model?
The following results show the performance when searching for transactions based on their creation time as the range condition. (For example, “Find all transactions from one hour ago to the most recent transaction!”)

Performance results with the different number of entities

While both systems exhibit increased execution times as the number of transactions increases, the proposed system demonstrates a significant performance advantage for SELECT operations. This is because in the existing Ethereum blockchain, performing range searches requires reading and checking every single transaction to find those that satisfy the condition, which is a time-consuming process. In contrast, the proposed system includes dedicated mechanisms for data searches, resulting in much faster execution times.

However, did the additional structures and processes introduced to achieve this effect adversely impact data storage speed? The results show no significant difference compared to the existing system. The proposed system exhibited superior performance in data range searches, and the added processes and mechanisms for enabling this capability did not impose a substantial burden on the blockchain system.

Resource usage

The study also demonstrated that the proposed additional processes and mechanisms within the blockchain did not significantly impact CPU usage or memory usage, indicating efficient utilization of computing resources.

In the paper, the researchers limited the transactions stored in the Relational Database to those resulting from smart contract execution. It would be interesting to see the results if the scope were expanded to include regular state transactions and receipt transactions in future research. Additionally, the experiments only tested range searches based on time, leaving room for further investigation using specific condition clauses. It would be valuable to have these aspects addressed in follow-up studies.

References

[1] Enabling SQL-Query Processing for Ethereum-based Blockchain Systems | Jongbeen Han, Heemin Kim, Hyeonsang Eom, Jonathan Coignard, Kesheng Wu, Yongseok Son | 2019. 06

[2] Ellis Horowitz,Sartaj Sahni,Dinesh Mehta |Fundamentals of Data Structures in C++ | Computer science press

[3] Wiki | B+Tree | https://ko.wikipedia.org/wiki/B%2B_%ED%8A%B8%EB%A6%AC

[4] The Oracle Problem-An Analysis of how Blockchain Oracles Undermine the Advantages of Decentralized Ledger Systems | Alexander Egberts

[5] LevelDB | Google | https://github.com/google/leveldb/blob/master/doc/index.md

--

--