Keywords

figure a
figure b

1 Overview

Smart contracts running on the Ethereum Virtual Machine (EVM) currently secure assets worth hundreds of billions of dollars  [6]. Despite the high impact of failure, losses due to security incidents are still very common (USD 4 billion across the full Web3 ecosystem in 2022 [15]). Symbolic execution is a rigorous method that evaluates a program using symbolic inputs, exploring the set of reachable final states. By analyzing these states, one can identify potential security flaws, and enhance the overall robustness of smart contracts.

In this paper, we present hevm, a Haskell implementation of the EVM that supports both symbolic and concrete execution. It symbolically analyzes EVM bytecode, exploring all possible states resulting from (potentially) symbolic inputs, storage, and environmental parameters. hevm enables scalably proving the properties of reachable final states, such as the absence of error states like assertion violations or arithmetic overflows. To prove a property, hevm either statically determines the final state to be unreachable, or generates SMT queries for potentially reachable final states and passes them to an SMT solver (Z3 [10], CVC5 [3], or Bitwuzla [20]) in parallel. The solver can disprove the property, returning a counterexample, prove that it can never be violated, or timeout. Additionally, hevm can (dis)prove the equivalence of two final-state sets originating from different EVM bytecode sequences, allowing for equivalence checking between different implementations of the same functionality.

In addition to its command line interface, hevm has a Solidity API that allows users to write symbolic tests in a language with which they are already familiar. This Solidity API, and its associated Solidity library of assertion helpers ds-test [24], has been widely adopted by many tools across the ecosystem and has become the dominant format for testing smart contracts. The format supports standard unit testing, fuzzing, and symbolic execution. In this interface, tests are implemented as public functions on Solidity contracts. By convention, test contracts should inherit from the DSTest contract which provides various assertion helpers. Concrete tests are functions prefixed with test, while symbolic tests have a prove prefix.

As an example, consider a simple Token contract (left) and the corresponding prove_transfer function (right). The contract performs token transfer from the caller’s account to a receiver, if the balance suffices and if the amount is not 42. The prove_transfer function checks whether the balance of the sender has been updated correctly, but it does not account for the edge case.

figure c

If this test is part of a foundry [27] project, then running hevm test from the root automatically discovers and executes the prove_transfer test, returning the expected counterexample.

figure d

hevm uses ABI metadata produced by solc to decode inputs that trigger the assertion violation and displays them in a human-readable format. Optionally, it can also produce a pretty-printed call trace, including log output from the ds-test assertion library.

In the rest of this paper, we do a short survey of related work (Sect. 2), give an overview of the internals of hevm (Sects. 3, 4 and 5), and evaluate hevm against other state-of-the-art tools (Sect. 6).

2 Related Work

Symbolic execution, introduced in the 1970 s to check for property violations of software [5, 13, 16], has garnered significant attention across various domains of software analysis. In this section, we focus on its aspects related to blockchain security, and refer the interested reader to [2] for an excellent survey of symbolic execution in general.

In the broader landscape of blockchain security, prior research has primarily focused on using symbolic execution to validate the reachability of potential issues detected via static code analysis, as done by Oyente [18], sCompile [7], and Mythril [19]. These systems analyze Solidity and EVM bytecode for common vulnerabilities through static code analysis and use symbolic execution to filter out (some) false positives in order to present higher-quality results to their users. These symbolic engines tend not to be complete. For example, Oyente specifically does not fully model the EVM and the underlying blockchain and is known to report false positives even after filtering through its symbolic execution engineFootnote 1. In general, when these symbolic engines cannot prove that a path is unreachable, they fail-safe by allowing the static code analysis to present a potentially false positive to the user.

Within the realm of symbolic execution on blockchain platforms, significant contributions have been made in adapting the technique to the EVM. Notable examples include the Certora Prover [4, 14], EthBMC [11], halmos [23], and KEVM [12]. These tools all employ a variation of the principles of \(\text {S}^{2}\)E [8] in that they perform execution inside a virtual environment at a specific EVM state and attempt to accurately symbolically execute it wherever possible. These tools all make use of some over-approximations (e.g. in the SMT encoding of cryptographic hash functions like SHA3).

Symbolic execution frameworks make extensive use of SMT solvers (e.g. CVC5 [3], Z3 [10], and Bitwuzla [20]) to determine the reachability of a given branch of the execution tree. Recently there have been significant advances made in SMT bit-vector theory solving, with the introduction of local search [21], and the use of integer reasoning for large bit-width formulas [22].

3 Symbolic Interpreter for the EVM

The EVM. The Ethereum Virtual Machine (EVM) is a stack machine designed for verifiable, deterministic execution over a shared state. A so-called consensus protocol is usually used to provide a globally unique ordering of transitions over this shared state. The EVM state is segmented into accounts. Each account has a unique 20-byte identifier (i.e. address), and can be controlled by a private key, or by some EVM bytecode (so-called contract account). Each account has a balance in Ether (the native currency of Ethereum), and contract accounts each have their own region of persistent storage. During contract execution, calls to other contracts generate call frames. Each call frame has its own isolated memory region (a mutable byte array). The calldata and returndata regions are used to pass data between parent and child callframes. Gas, a unit for measuring computational effort, limits and monetizes executions to prevent network abuse.

The Interpreter. The core of the symbolic interpreter is the VM record, holding the machine state at each execution step with individual state items stored as terms of hevm ’s internal representation: Expr (Sec. 4). The interpreter will fully evaluate concrete subcomputations, ensuring that terms remain as concrete as possible.

The interpreter executes by repeatedly applying the exec1 function to update the VM record within Haskell’s State monad. This function processes the opcode at the pc value in the input state and returns an updated state. For efficiency in resource-intensive tasks, the St  [17] monad is used for in-place memory mutation when hevm can be sure that all relevant state entries are fully concrete.

Branching And Eager Exploration. At potential branch points (usually the conditional jump instruction JUMPI), hevm clones the VM state and explores both possible branches. Each branch is executed in parallel using Haskell’s asynchronous runtime. Once the end of execution is reached in each branch, they are summarized into Expr End’s, retaining only the final externally observable effects of each branch. hevm skips almost all SMT queries during exploration. This results in the exploration of provably unreachable paths. Since SMT queries are usually more costly than path exploration, we have found this to be a performance optimization in most cases. Before SMT-based reachability analysis, hevm applies static simplification passes (constant folding, partial evaluation, etc.) that are comparatively very cheap to execute, in order to eliminate unreachable branches during exploration.

Loops, Recursion, and Loop Heuristics. EVM loops are implemented using the JUMPI instruction and are unrolled by hevm during exploration. A naive application of our eager approach to exploration would cause infinite loops (even in loops with a fixed iteration count), as both branches of a JUMPI would always be explored. hevm tackles this potential issue in the following two ways. (1) User-supplied maximum iteration bound. In this approach, loops are unrolled only up until some user-defined depth. This approach guarantees termination even with unbounded loops, but can lead to incompleteness of exploration (for which a warning is printed). (2) Automated detection of bounded loops. In this approach, a loop detection heuristic is applied and whenever the loop is about to be executed again, an SMT call is made to determine branch condition satisfiability. If the SMT solver can determine that the loop cannot run one more time (e.g. because its condition is always falsified after k iterations), the loop is exited. This approach retains the completeness of exploration while guaranteeing termination for bounded loops only.

EVM control flow, which implements both branching and looping via JUMPI, complicates loop detection. A naive approach, classifying repeated visits to the same JUMPI as a loop, can misidentify repeated branches as loops. hevm overcomes this by recording the stack content at each JUMPI visit. A subsequent encounter with the same JUMPI leads to a comparison of valid jump destinations on the stack. A change in these destinations indicates a branch, while consistency suggests a loop. This heuristic, based on the assumption that reused basic blocks have differing return locations, is effective for most Solidity-generated code.

Gas. In the context of symbolic analysis of EVM programs, gas accounting is not usually relevant for assessing most safety properties. The primary concern is potential branching on the result of the GAS opcode, which allows introspection into the remaining gas. hevm incorporates a precise gas model for concrete execution. For symbolic execution, we implement an abstracted gas model. When executing symbolically, precise gas tracking is omitted. Instead, executing the GAS opcode introduces a unique symbolic value on the stack to represent the remaining gas.

Remote State and RPC Calls. hevm supports symbolic execution against a concrete state from a blockchain node adhering to the Ethereum RPC interface [1]. This enables execution against the live state of networks like Ethereum. In this mode, when executing the SLOAD/CALL opcodes, hevm makes an RPC call to a user-defined node to fetch storage values and contract code from the node. Execution against remote state can be very valuable when testing since it allows users to easily write tests that exactly mirror the production environment that their code is planned to be deployed into.

4 Expr, hevm ’s Internal Representation

The result of symbolic execution is represented using hevm ’s internal representation, Expr, that encodes final states and the path conditions leading to them. hevm leverages Haskell’s type system and indexes Expr with the type of expression it represents, using a so-called generalized algebraic data type (GADT). This allows hevm to build terms that are well-typed by construction. An expression e has a type Expr a, where a can be any of the types Byte, EWord, Storage, Buf, EAddr, EContract, and End, which we explain below.

Expr End : Top-level expressions. At the top-level, an expression is a tree with if-then-else statements as branches, representing branching points of execution, and final states as leaves. Final states can be either a successful final state, a failed state, or a partial execution state.

figure e

A Success node represents a successful final state and contains the returndata (of type Expr Buf) and the resulting EVM state, which is a contract map from (potentially symbolic) contract addresses to the contract’s state (type Map (Expr EAddr) (Expr EContract)). The state of a contract is a record consisting of its bytecode, storage (of type Expr Storage), balance, and nonce.

figure f

Partial nodes represent cases where hevm prematurely halted execution, either due to an unsupported execution state (e.g. a symbolic JUMPDEST, or a call into an address with unknown code), or the loop unrolling bound being reached.

Each final state is annotated with a list of logical propositions of type [Prop], representing the path conditions leading there. The tree structure can be flattened to a list of final symbolic states, accumulating the if-then-else conditions as path conditions in the final state nodes. For example the term

figure g

is flattened down to the list:

figure h

Expr Byte and Expr Word : Bytes and 256-bit words. Words can be concrete or symbolic. Operations on words include arithmetic, boolean, and bitwise operations. Bytes can be concrete literals or symbolic expressions but hevm does not have symbolic byte variables. Expr has specific abstract values representing the block, transaction, or frame context.

figure i

Expr Buf : Buffers. Memory, calldata, and returndata are all represented as buffers. A base buffer is either a concrete buffer, represented as a byte string, or a symbolic buffer variable. Operations on buffers include reading and writing bytes and words, copying slices, getting the length, and computing the Keccak cryptographic hash.

figure j

Expr Storage : Storage. An individual contract’s storage maps word-sized storage addresses to words. Base storage values are either a concrete map or an abstract variable annotated with the address of the contract it belongs to.

figure k

Expr EAddr : Addresses. An account address (not to be confused with a storage address) is a 160-bit unique identifier created when a new account is added to the state. For soundness, hevm asserts that all symbolic contract addresses that are present in the the contracts map are pairwise distinct. the aliasing of symbolic addresses must be prevented.

figure l

Prop : Assertions. Various assertions, such as path conditions, are represented in Haskell with the Prop datatype. The datatype includes boolean connectives, polymorphic equality, and comparison operators over hevm words. This is distinct from the EVM level boolean operations, which operate on, and return 256-bit Words.

figure m

4.1 Expr Simplification

The Expr generated by the symbolic interpreter can be verbose with many opportunities for simplification. To simplify it, the following set of rewrite rules are applied: (1) Constant folding is a technique that replaces an expression with its concrete value, if it can be computed e.g. via applying arithmetic operations such as addition modulo \(2^{256}\). (2) Canonicalization is performed, making sure that wherever possible, the first argument of a 2-argument operator is always a constant. This helps improve the effectiveness of the following steps. (3) Partial Evaluation is performed wherever possible. For example Max (Lit 0) (Var"a") is evaluated to Var "a". (4) Arithmetic simplifications are applied, to reduce the overhead of execution. For example, Sub (Add x y) y is rewritten to x. (5) Write simplifications help eliminate a memory write in case there is no corresponding read to the affected memory region or the write has been overwritten by a subsequent write.

Sets of assertions, expressed as list of Prop ’s can also be simplified. Hence, hevm performs (6) Equivalence propagation, a technique where if e.g. an Expr forces Var "a" to Lit 5, then Var "a" can be safely replaced in every other Expr with Lit 5. To make the final expression easier to deal with, hevm also applies (7) Trivial constraint deletion to remove all constant true constraints, and remove all constraints but the constant false, in case a constant false is found.

4.2 Example Program in Expr

Figure 1 shows how a simple Solidity program looks in Expr once the Expr has been simplified. The end-states of the program are represented in orange and the branch conditions in green. Solidity inserts a reverting end-state when Ether is sent to the function (i.e. TxValue is larger than zero) since the function is not marked as payable. The next end-state is due to the require statement that forces a revert in case the condition val1 < 10 doesn’t hold. Finally, the assert forces either a revert or a success depending on whether val1 can overflow.

5 SMT Encoding

The core function used for verification in hevm is assertProps which takes a list of Prop ’s (the data type of propositions) and turns it into an SMT2 script to check its validity. It first declares any necessary symbolic variables (buffers, stores, addresses, words, EVM context) mentioned in the input and then asserts the propositions translating Expr and Prop into SMT2. We discuss some subtle points of this encoding below.

Fig. 1.
figure 1

The result of symbolically executing the call test(uint val1).

Keccak. Solidity uses the Keccak hash function to adapt its high-level storage model to the EVM’s flat storage structure. For example, for the mapping data type the value of key k is located at the storage slot \(\texttt{keccak256}(k || p)\), where p is a compile-time constant [28] and || is concatenation. The security of this scheme relies on the hash function being collision-free. For example, in the following Solidity contract, absence of injectivity can cause a collision in the storage slot of a1 and a2.

figure n

In the SMT encoding, Keccak is an uninterpreted function. To encode injectivity without using quantifiers, hevm gathers all buffers onto which Keccak is applied and for each possible pair (b1, b2), adds the assertion b1 != b2 \(\rightarrow \) Keccak(b1) != Keccak(b2). We also need to assert that the “gap” between two Keccak calls is large enough (currently it is arbitrarily set to be at least 256). This is important for avoiding collisions between Solidity arrays.

Storage Decomposition. While the encoding above is accurate, the usage of uninterpreted functions has been empirically identified as a bottleneck in SMT query execution. A key optimization in hevm is the analysis of storage accesses to reconstruct the original high-level Solidity storage layout, an idea originally implemented by halmos. The required analysis is done without making any additional trust assumptions, i.e. hevm does not utilize the storage layout metadata made available as part of the compiler output. The inferred structure is used in SMT queries, assigning separate SMT arrays to each logical storage region and eliminating Keccak calls in the generated SMT. This optimization is executed through rewrite rules on Expr, applied only when safe. The current implementation supports 1-dimensional mappings and arrays, with plans to expand to more complex nested structures.

Abstraction-Refinement. Abstraction-refinement [9] is a method used to improve the performance of SMT solvers. Select (often more complicated) parts of the formula are abstracted into a fresh new variable(s) and satisfiability is checked. If this new formula is UNSAT, then the original formula must have been UNSAT too since abstraction only weakened it. If the new formula is SAT, the solver may have encountered a spurious counterexample. In such cases, hevm performs so-called refinement to constrain the fresh variables to be (closer to) the original value.

hevm performs two kinds of abstraction-refinement, one for complex arithmetic operations, and one for EVM memory operations. In both cases, hevm does not perform counterexample-guided refinements. Instead it refines all abstracted-away elements in one go.

Fuzzing over the Symbolic Endstate. Some common properties of interest, such as those involving complex non-linear arithmetic, can be very hard for an SMT solver to analyze. If unsafe, counterexamples to these properties can often be found with little effort via fuzzing. hevm by default applies a short round of fuzzing over the symbolic endstate of interest before passing it to the solver.

6 Evaluation

Correctness. hevm is tested to pass the standard EVM test suite [26], which is a set of test vectors to validate the conformance of an EVM client to the expected semantics. This gives confidence in the concrete part of hevm ’s semantics. In addition, hevm comes with a differential fuzzing harness against geth, the most popular EVM concrete execution client. This fuzzing framework exercises both the concrete and symbolic parts of hevm ’s semantics. It randomly generates a bytecode sequence, symbolically executes it, concretizes any abstract variables, and simplifies it down to a fully concrete result. This concrete result is then checked against the result produced by geth. Finally, hevm contains a set of SMT-based semantic fuzzing harnesses to check the various simplification steps, making sure the semantic meaning of Expr terms is not changed through simplification.

Performance. To validate the performance of hevm, we collaborated with the authors of competitive symbolic analysis tooling halmos [23] and KEVM [12] to produce a set of shared benchmarks publicly available on GitHub [25]. We hope this benchmark set can become a useful standard that future tooling can use to validate performance, as well as an interface for feedback from users to tool developers. It currently contains 199 benchmarks: a mix of conformance tests (focused on the correctness of a specific EVM feature or opcode) and performance tests that mimic real-world use cases (ERC20 and ERC721A tokens, the beacon chain deposit contract, and various CTF challenges). Test harnesses are currently implemented for hevm, halmos [23], and KEVM (through kontrol [29]), the tools that support the ds-test format. kontrol and halmos were run with default settings, hevm was run with both the default settings (using z3) and with its bitwuzla backend. Each tool was given a 300 s time limit and 110GB memory limit for each benchmark. The benchmarks were run on an AMD 5950x with 128GB RAM (Table 1).

Table 1. Performance and correctness of different symbolic analysis tools on the benchmark set [25]

Results. All tools reported false positives, indicating a counterexample for tests marked as safe. In the case of kontrol, the majority resulted from differences in revert handling: kontrol treats all reverts as test failures, while halmos and hevm only consider reverts from assertion violations as failures. Newer versions of kontrol have added a mode to align with the hevm and halmos semantics. For hevm and halmos, the false positives were caused by the various overabstractions utilized by each tool, e.g. in the SMT encoding of Keccak.

Each tool also reported false negatives, where a test was marked as safe but should have shown a counterexample. For hevm, this was due to a now-fixed bug. Another false positive was reported by both halmos and kontrol: both tools pass concretely sized calldata to the top level test method, meaning they missed assertions in branches that could only be reached if the input calldata had an abstract size. After discussion with the authors of halmos and kontrol, we have agreed to standardize on concrete calldata size. kontrol had an additional false positive due to differences in the concreteness of test inputs (callvalue and test contract balance). We have not yet reached a consensus on which approach should be standard in this case.

Fig. 2.
figure 2

CDF plot of the results. False negatives/positives are excluded.

Fig. 3.
figure 3

Pairwise comparison of hevm, halmos, and KEVM. In comparison to halmos, hevm solves more instances, although can sometimes be slower. In comparison to KEVM, hevm is both faster and solves more instances.