Building up from the Ethereum Bytecode

At RET2, we take pride in the breadth and depth of the security research that we perform. Our passion and pursuit of niche subjects has taken us through all walks of the industry. In each new space we touch on, we hope to leave it having made some measurable innovation. But as interest fades, we oftentimes archive this work and move on.

This past weekend, we encountered a candid opportunity to publicly exercise some of our previously unpublished research and technology. Specifically, this blogpost will detail the practical application of an interactive decompiler we wrote for the Ethereum bytecode.

We demonstrate how this technology trivialized the reverse engineering of a closed source smart contract for the qualification round of DEFCON CTF 2018, continuing our narrative on the importance of interactive security tooling.

Decompiling the EVM bytecode from within Binary Ninja

The Challenge

DEFCON CTF is an annual computer security competition that attracts some of the best hackers from around the globe to solve a number of complex challenges in less than 48 hours.

This year, one of the challenges from the qualification round tasked players to reverse engineer a ‘gambling’ based Ethereum smart contract. Named SAG, the challenge provided source code to a smart contract and a link for the contract deployed on the Ropsten Ethereum Testnet.

Looking at the Solidity source provided with this challenge, we don’t have much to go on. The source is actually for a proxy contract, and the only function of interest is gamble(uint256 guess, uint256 seed) as abbreviated below.

pragma solidity ^0.4.17;

import "./Sag.sol";

contract SagProxy {
    ...

    Sag private sag;
    ...

    function gamble(uint256 guess, uint256 seed) public
    {
        sag.gamble(guess, seed);
    }

   ...
}

From the provided snippet, we see that the gamble function simply calls out to the sag contract object. This means that the real ‘gambling logic’ resides in a second smart contact that source was not provided for.

We must reverse engineer the real sag contract given only its bytecode.

EVM Bytecode

Ethereum smart contracts live on the blockchain in their compiled form of EVM bytecode. This bytecode executes in what is known as the Ethereum Virtual Machine (EVM).

The EVM bytecode represents a stack machine based ISA. Stack machine based applications are notoriously difficult for humans to reverse engineer due to the manner in which variables are stored and operated upon at runtime. The EVM bytecode is no exception.

The EVM bytecode can be very difficult to follow

There are a number of resources available for reverse engineering the EVM bytecode (1, 2, 3, 4, 5), but for the reasons previously stated - this is an expensive and tedious process for even the most seasoned reverse engineers.

EthRays Decompiler

Curiosity has led us in and out of blockchain security research over the past year. One of the more interesting (and recent) projects that emerged from of our time spent exploring the hard problems in this space was an interactive decompiler for the EVM bytecode.

We chose to build our decompiler on top of Binary Ninja & Ethersplay to aid us in the investigation of interesting (or nefarious) closed source smart contracts. Integrating naturally into existing tools, our EVM decompiler draws much of its inspiration directly from the iconic HexRays Decompiler shipped with IDA Pro.

We mirror the important HexRays hotkeys, one-for-one. The F5 key will decompile the EVM bytecode into pseudo-source within Binary Ninja.

Pressing F5 decompiles the current function

This interactive decompilation context supports reverse engineering amenities such as placing comments (/), block comments (INS), renaming variables (N), synchronized variable highlighting, and more.

Comments and variable manipulation

The TAB key can be used to explore how tokens in the decomplation context map to the underlying EVM instructions, or vice versa (disassembly –> decompilation mapping).

The `TAB` key helps correlate disassembly <--> decompilation

Synchronized variable highlighting and the ability to lock (multiple) variable highlights (CTRL+H) makes it easy to follow data as it flows through a given function.

Locking variable highlights across a number multiple variables can illustrate data flow

Using some of these capabilities, we will now demonstrate our process of reverse engineering the DEFCON contract.

Decompiling the Gambling Contract

Clicking the Contracts button included in our small suite of ETH utilities, we are able to open the target contract directly from the blockchain. Binary Ninja + Ethersplay will explore the smart contract, defining functions and building the binary’s control flow graph (CFG).

Opening the `sag` contract directly from the blockchain

Pressing F5, we are able to instantly decompile any given function in the binary. We use this to get a better idea of the landscape for the given contract, and identify code we care to spend time reverse engineering.

Using our EVM decompiler to survey the functions in this contract

After poking through the list of discovered functions, we see that the function with hash #22e8c8fc has the most code by a wide margin. We can speculate that this function is probably the gamble(uint256 guess, uint256 seed) function that we are interested in.

Let’s walk through the first part of the decompiled code:

The initial round of checks performed within the gamble(...) function

At the top of this function, we see three individual checks:

CALLVALUE (the amount of ETH sent in this transaction) must be zero, eg: this function is not payable
GAS (the amount of ETH sent to pay for contract execution) must be less than a global value in STORAGE[2]
Finally, a key:value pair correlated with the caller’s ETH address must be zero
- We speculate this probably represents if the caller has been previously marked as a winner

Simple enough, let’s move on to the next meaningful part of section of this function:

The raw decompilation of the second part of the the gamble(...) function

At first glance, the non-descript variable names can make this loop look intimidating, but the interactivity of the decompiler allows us to quickly demystify it.

Annotating the first loop structure in the gamble(...) function

The loop does some simple hashing of user controlled input (the seed), XORing it with the caller’s ETH address, and storing successive rounds of this operation in a global storage array of 32 (0x20) elements.

Moving on to the third part of this function, we find a more complex set of nested loops. Again, we can use the interactivity of the decompiler to break down this logic in a matter of minutes.

Casual annotation of the nested loop structure in the gamble(...) function

We see it comparing two values from the global g_hashes array inside a double for loop. If the second value is larger, the values are swapped. These loops simply order the global array in-place from largest hash to smallest hash. This is the well-known selection sort algorithm!

The fourth and final section of the decompiled code checks whether our input argument (the guess) matches the first entry of the sorted array of hashes. If we guessed correctly, our address is logged and the hashmap entry for our address is set to one, marking us as a winner.

The final section of the gamble(...) function evaluates if the user guess is correct

At this point, we understand the gamble(...) function well enough to generate winning inputs for the contract.

Solving the Challenge

The contract has one last trick up its sleeve. If you recall from the beginning of the gamble function, we are limited by the amount of gas we can send in.

The artifcial GAS limit imposed by the gamble(...) function

In the Ethereum Virtual Machine, each bytecode instruction costs some amount of gas to execute. Gas is used to pay for ‘cpu time’ on the decentralized network.

A sampling of some EVM instructions & their GAS prices

Querying the blockchain, we learn that STORAGE[2] will limit us to 2000000 gas in our call to gamble(...). In the EVM, each STORAGE operation costs ~5000 gas to execute, so each swap operation in the sort requires ~10000 gas. This means we can run at most 200 storage swaps before running out of ‘cpu time’ and aborting execution.

To meet this constraint, we brute forced a seed value (one of the inputs which we control) that would slide under the gas limit of the gamble(...) function. We found that the seed 0x1355e65 would consume only 188 swaps, with its winning number (the guess) being 0xf3fc5acd82e9ac42e9b34a38e39198d34930c088f0650911b51cf6cb0c7f8a8

Sending these values to the contract successfully completed the transaction and marked our ETH address as a winner.

Receipt for the winning transaction (Ropsten TESTNET)

To claim the flag, we called the requestPrize(bytes32 msgHash, uint8 v, bytes32 r, bytes32 s) function in the original (proxy) contract with a signed message to prove we owned the private key for the calling address. The challenge then used our public key to encrypt the flag, and sent it via the deliverPrize(address winner, bytes prize) function.

Decrypting this personalized message gave us the flag: OOO{D0_n07_w4st3_tOOO_mUch-GAS!}

We were the first team to solve this challenge.

Conclusion

The use of our research and technology gave us a clear advantage in this competition. Setting aside the novelty of an EVM decompiler, this post illustrates that the value of usability in robust security tooling can be priceless.

We’re excited by the evolution of the blockchain security ecosystem, and only expect the availability and quality of these advanced technologies to improve with time.

ENGINEERING BLOG