🗓️ Week 12

I started this week with a clear goal: prepare a report on EVM memory trends on the mainnet. It seemed like a straightforward task—until it wasn’t!

The plan

I began by reviewing Geth's debug RPC endpoints. The debug_traceTransaction method replays a transaction and outputs an EVM trace, including memory usage:

curl <RPC URL> \
   -X POST \
   -H "Content-Type: application/json" \
   -d '{
  "method": "debug_traceTransaction",
  "params": [
    "0x990e14fba083e9350f78da6dad285634ab23c448df9930a69dfcb9abe9ded6a7",
    {
      "enableMemory": true
    }
  ],
  "id": 1,
  "jsonrpc": "2.0"
}'

Here’s a sample EVM trace:

{
  "gas": 171014,
  "failed": false,
  "returnValue": "",
  "structLogs": [
    {
      "pc": 0,
      "op": "PUSH1",
      "gas": 481029,
      "gasCost": 3,
      "depth": 1,
      "stack": [],
      "memory": []
    },
    ...
  ]
}

The log includes call depth, stack, and memory contents. My plan was to perform frame-by-frame memory analysis. Fortunately, Mario provided access to his RPC node with debug endpoints enabled.

I quickly developed a script to scrape the data and initially decided to output it in JSON format:

{
    "block_number": {
        "tx_hash": {
            "gas": <tx total gas>,
            "logs": {
                "call_frame": {
                    "memoryGas": <call frame memory gas>,
                    "memorySize": <call frame memory size>
                }
            }
        }
    }
}

However, the JSON file size quickly grew to 200MB per block. Streaming data to JSON proved cumbersome, so I switched to a .csv format:

block,tx_hash,tx_gas,to,call_depth,memory_instruction,memory_access_offset,memory_gas_cost,pre_active_memory_size,post_active_memory_size,memory_expansion

Since CSV doesn’t support hierarchical data, I flattened it.

The process ran smoothly for about 100 blocks before crashing with the following error:

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32000,
    "message": "historical state not available in path scheme yet"
  }
}

Of course. Only archive nodes store more than 128 blocks of historical state, but I needed data from far more than 128 blocks. Pawel suggested analyzing blocks from the past week—approximately 50,400 blocks given a 12-second block time. I decided to aim for 100,000 blocks.

Sync modes

Archive nodes are resource-intensive. A "full" archive node, which retains all state back to Genesis, requires over 12TB of storage. I'll revisit this option if necessary.

First, I need to see if any RPC providers offer debug endpoints for archive data. This is an unusual requirement since most commonly queried archived data includes historical balance, code, and storage.

RPC please

I found an RPC provider that offers archived debug data. Querying a week-old block worked well, but there’s a catch:

Processed block 20569001 in 110.50 seconds

Processing one block takes about 100 seconds. So, processing 100,000 blocks would take roughly *checks notes* 155 days.

Next, I’ll explore ways to speed up the process. If that fails, I’ll consider using an archive node as a last resort.