🗓️️ Week 18 TLDR: Report is out!
EVM Memory Analysis: Usage Trends
Introduction
The EVM's memory is a word-addressed byte array that stores its ephemeral state. Accessing memory incurs fees in a unit called gas.
Gas does not measure the direct cost of execution, but rather the computational effort required by a node's hardware to execute EVM instructions. Transactors pay for per unit gas at market value which ultimately determines the execution cost.
The goal of this project is to provide insights that could potentially lead to repricing memory access costs, making them more affordable. To support this analysis, the usage of memory-accessing instructions (listed in Appendix A) was first examined across 100,000 blocks (from 20,770,001 to 20,870,000) on the Ethereum mainnet.
Usage Trends
Initially, 3 billion (3,116,961,839) memory-accessing instructions were recorded. However, certain instructions, such as CALL, may not require input from memory or may discard their return data instead of writing to memory. In these instances, a zero-length range of memory may be referenced, which does not trigger an expansion and incurs no gas cost. In total, 62,086,976 such instructions (≈2% of all memory-accessing instructions) were identified and excluded from further analysis which brings down the total instruction count to 3,054,874,863.
import vaex
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from scipy.stats import norm
# CONFIG
PERCENTILE_RANGES = [25, 50, 75, 95, 99]
PRIMARY_COLOR = '#731963'
ACCENT_COLOR_1 = '#F0E100'
ACCENT_COLOR_2 = '#808080'
# Load data
df = vaex.open('data/combined_call_frames.hdf5')
opcode_to_mnemonic = {"20":"KECCAK256","37":"CALLDATACOPY","39":"CODECOPY","51":"MLOAD","52":"MSTORE","53":"MSTORE8","3c":"EXTCODECOPY","3e":"RETURNDATACOPY","5e":"MCOPY","a0":"LOG0","a1":"LOG1","a2":"LOG2","a3":"LOG3","a4":"LOG4","f0":"CREATE","f1":"CALL","f2":"CALLCODE","f3":"RETURN","f4":"DELEGATECALL","f5":"CREATE2","fa":"STATICCALL"}
df["opcode"] = df["opcode"].map(opcode_to_mnemonic)
dfMemory Size
Across 100,000 blocks, a total of 111.5 GiB of memory was accessed (expanded or otherwise). Each instruction accesses between 1 and 460,800 bytes of memory, with an average of 39 bytes (slightly over 1 EVM word). In a single instance CODECOPY accessed the maximum memory size i.e 460,800 bytes. The CREATE*, *CODECOPY, and *CALL family of instructions consume the most memory on average, followed by KECCAK256.
df_summary = df.describe()
def show_column_summary(column):
column_summary = df_summary[[str(column)]].copy(deep=True)
percentiles = np.percentile(column.values, PERCENTILE_RANGES)
for i, percentile in enumerate(percentiles):
column_summary.loc[f"{PERCENTILE_RANGES[i]} percentile"] = percentile
return column_summary
show_column_summary(df.memory_access_size)Total memory access across all blocks
df.memory_access_size.sum()Frequently accessed memory
Only for sizes below the 99th percentile to account for outliers.
plt.figure(figsize=(10,6))
df.viz.histogram('memory_access_size', limits=[0,132], lw=3, shape=64, color=PRIMARY_COLOR, label="Histogram of Memory Access Sizes")
plt.xlabel('Size of Memory Access [Bytes]')
mean_size = df.memory_access_size.mean()
plt.axvline(mean_size, color=PRIMARY_COLOR, linewidth=1, linestyle='--', label=f"mean(μ)={mean_size:.2f}")
plt.legend()
plt.show()Memory consumption by instruction
vaex.settings.display.max_rows = 30
opcode_range = df.groupby("opcode", agg={"memory_access_size": ["mean","max", "min"]}).sort(["memory_access_size_mean", "memory_access_size_max", "memory_access_size_min"],ascending=False).to_pandas_df()
opcode_rangemelted_opcode_range = opcode_range.melt(id_vars='opcode',
value_vars=['memory_access_size_mean', 'memory_access_size_max', 'memory_access_size_min'],
var_name='memory_access_type',
value_name='size')
plt.figure(figsize=(12, 6))
sns.boxplot(data=melted_opcode_range, x='size', y='opcode', color=PRIMARY_COLOR)
plt.title('Memory Access Size Distribution by Instruction')
plt.xlabel('Memory Access Size [Bytes]')
plt.ylabel('Instruction')
plt.show()Memory Expansion
Across 100,000 blocks, a total of 32.14 GiB of memory was expanded. Up to 559,328 bytes was expanded, with an average of 11 bytes. The MSTORE, *CODECOPY, and *CALL family of instructions expands the maximum amount of memory.
show_column_summary(df.memory_expansion)Total memory expansion across all blocks
df.memory_expansion.sum()Distribution of memory expansion
plt.figure(figsize=(10,6))
df.viz.histogram('memory_expansion', limits=[0,132], lw=3, shape=64, color=PRIMARY_COLOR, label="Histogram of Memory Expansion Size")
plt.xlabel('Size [Bytes]')
mean_expansion = df.memory_expansion.mean()
plt.axvline(mean_expansion, color=PRIMARY_COLOR, linewidth=1, linestyle='--', label=f"mean(μ)={mean_expansion:.2f}")
plt.legend()
plt.show()expansion_by_opcode = df.groupby("opcode", agg={"memory_expansion": ["mean","max", "min"]}).sort(["memory_expansion_mean"],ascending=False).to_pandas_df()
expansion_by_opcodeMemory Offset
Half of all memory accesses occur at offsets of 4 words (128 bytes) or less. The most frequently accessed offset is 64 bytes, which is used by the Solidity memory pointer.
show_column_summary(df.memory_access_offset)Offset distribution under 95 percentile.
plt.figure(figsize=(10,6))
df.viz.histogram('memory_access_offset', limits=[0,3232], lw=3, shape=64, color=PRIMARY_COLOR, label="Histogram of Memory Offsets Accessed")
plt.xlabel('Offset [Bytes]')
mean_offset = df.memory_access_offset.mean()
plt.axvline(mean_offset, color=PRIMARY_COLOR, linewidth=1, linestyle='--', label=f"mean(μ)={mean_offset:.2f}")
plt.legend()
plt.show()64 bytes is the most common memory offset, which is consistent with its role as the offset for the Solidity free memory pointer.
offset_value_counts = df.memory_access_offset.value_counts().head(10)
plt.figure(figsize=(18,6))
sns.barplot(x=offset_value_counts.index, y=offset_value_counts.values, color=PRIMARY_COLOR, order=offset_value_counts.index)
plt.xlabel('Offset [Bytes]')
plt.title("Top 10 Most Frequently Accessed Memory Offsets")
plt.ylabel('Count')
plt.show()Gas cost
A total of 80 trillion (80,513,608,146,590) units of gas was expended on memory access, with an average gas cost of 26,355.8 per memory access. The *CALL family of instructions consumes the most gas due to their significant memory usage.
show_column_summary(df.opcode_gas_cost)df.opcode_gas_cost.sum()plt.figure(figsize=(10,6))
df.viz.histogram('opcode_gas_cost', limits=[0,360263], lw=3, shape=20, color=PRIMARY_COLOR, label="Histogram of Gas Cost", )
plt.xlabel('Gas Cost')
mean_gas_cost = df.opcode_gas_cost.mean()
plt.axvline(mean_gas_cost, color=PRIMARY_COLOR, linewidth=1, linestyle='--', label=f"mean(μ)={mean_gas_cost:.2f}")
plt.legend()
plt.show()opcode_cost_range = df.groupby("opcode", agg={"opcode_gas_cost": ["mean","max", "min"]}).sort(["opcode_gas_cost_mean"],ascending=False).to_pandas_df()
opcode_cost_rangemelted_opcode_cost_range = opcode_cost_range.melt(id_vars='opcode',
value_vars=['opcode_gas_cost_mean', 'opcode_gas_cost_max', 'opcode_gas_cost_min'],
var_name='opcode_gas_cost_type',
value_name='size')
plt.figure(figsize=(12, 6))
sns.boxplot(data=melted_opcode_cost_range, x='size', y='opcode', color=PRIMARY_COLOR)
plt.title('Gas Cost Distribution by Instruction')
plt.xlabel('Gas')
plt.ylabel('Instruction')
plt.show()Instructions
The MSTORE and MLOAD instructions handle more than 83% of memory accesses, KECCAK256 follows at 6%.
opcode_value_counts = df.opcode.value_counts()
relative_percentages = (opcode_value_counts / opcode_value_counts.sum()) * 100
pd.DataFrame({
'Value': opcode_value_counts,
'Relative Percentage': relative_percentages.map("{:f} %".format)
})plt.figure(figsize=(18,6))
sns.barplot(x=opcode_value_counts.index, y=opcode_value_counts.values, color=PRIMARY_COLOR)
plt.xlabel('Opcodes')
plt.ylabel('Count')
plt.xticks(rotation='vertical')
plt.show()Call Depth
The dataset has an average call depth of 2.8, with a 99th percentile of 9 call frames per transaction. Call depth influences memory expansion costs, as each call frame starts with an empty memory state.
df_call_frames = df.groupby(['transaction_id']).agg({'call_depth':vaex.agg.max('call_depth')})
show_column_summary(df_call_frames.call_depth)plt.figure(figsize=(10,6))
df_call_frames.viz.histogram('call_depth', limits="minmax", lw=3, shape=64, color=PRIMARY_COLOR, label="Histogram of Call Depth")
plt.xlabel('Call Depth')
mean_call_depth = df_call_frames.call_depth.mean()
plt.axvline(mean_call_depth, color=PRIMARY_COLOR, linewidth=1, linestyle='--', label=f"mean(μ)={mean_call_depth:.2f}")
plt.legend()
plt.show()References
- 📄 Gavin W., Ethereum Yellow Paper
- 📄 EPF Wiki, EVM
- 📄 Eth Research, On Block Sizes, Gas Limits and Scalability
- 📄 John A., Wait, It's All Resource Pricing?
- 📄 John A., Induced Demand from Blockchain Resource Pricing
- 📄 Martin H., Gas benchmarks
- 📜 Ipsilon, EVM benchmarks
- 📄 Ethereum Research, Gas Price Table
- 📄 Ipsilon et al., EVM384 Update 5: First Gas Cost Estimates
- 📜 Geth, Protocol Params
- 📄 Eth Research,EIP-1380: Reduced gas cost for call to self
- 📄 Michael K., A Scalable Method to Analyze Gas Costs, Loops and Related Security Vulnerabilities on the Ethereum Virtual Machine
Appendix A: Memory-accessing Instructions
| Instruction | Description |
|---|---|
| KECCAK256 | Compute Keccak-256 hash. |
| CALLDATACOPY | Copy input data in current environment to memory. |
| CODECOPY | Copy code running in current environment to memory. |
| MLOAD | Load word from memory. |
| MSTORE | Save word to memory. |
| MSTORE8 | Save byte to memory. |
| EXTCODECOPY | Copy an account's code to memory. |
| RETURNDATACOPY | Copy output data from the previous call to memory. |
| MCOPY | Duplicate data in memory. |
| LOG0 | Append log record with no topics. |
| LOG1 | Append log record with one topic. |
| LOG2 | Append log record with two topics. |
| LOG3 | Append log record with three topics. |
| LOG4 | Append log record with four topics. |
| CREATE | Create a new account with associated code. |
| CALL | Message-call into an account. |
| CALLCODE | Message-call into this account with an alternative account's code. |
| RETURN | Halt execution returning output data. |
| DELEGATECALL | Message-call into this account with an alternative account's code, but persisting the current values for sender and value. |
| CREATE2 | Create a new deterministic account with associated code. |
| STATICCALL | Static message-call into an account. |