Internals : This doc explains the internals of data locations. While General Notes mentions some differences, this doc delves into more explanation.
Internally, data locations are of 4 types.
storage
: The actual store on the blockchain. State variables here persist after any execution. Highest gas cost to manipulate.Memory
is data during execution. It can only be created inside functions as local variables. Miner’s RAM/Stack is where memory data lives. After each function call completes it is wiped. Costs less gas than storagecalldata
is the input data generated by the user and sent as a transaction. All the parameters the user intended to give the contract are located in the calldata. It’s the data field in a transaction.stack
is the function stack and is also the default location to store value types inside a function.
Organisation of storage
:
- EVM operates in 32 byte slots.
- All value types like uint8, bool, address and so on are tightly packed.
- So,
uint128
anduint128
are packed side by side in one 32 byte slot. - Any element not fitting in left over space is moved onto a new slot
- So,
uint128
and thenuint256
take up 2 slots instead of one. Becauseuint256
can’t fit in first slot, a new second 32 byte slot is used. - Arrays and structs always start in a new slot. Any individual elements can still occupy left over space though after array/struct ends according to above rules .
- Note that, Dynamic arrays and mappings are NOT stored in the middle of normal fixed size state variables.
- Now assume that all fixed size elements are stored and now we have dynamic arrays and mappings left to allocate.
- Dynamic Arrays : After allocating fixed types as above, we are now on
p
th 32 byte slot. For dynamic arrays, this slot contains number of elements in this dynamic arrays i.e length.p
here is the pointer/memory address value. Actual array data is stored atkeccak256(p)
. That is, while locationp
contains number of elements of dynamic array, hash of p is where the array actually starts. Now elements in array itself are stored as general above rules. Fit in?pack em. Can’t fit? new slot. Check out byte1[] vs bytes and string below. - Mappings : ** After allocation of fixed types, let’s say we ended up on
p
th slot. This slot **stays empty. Let’s say we want to access element withk as key
. The formula is
keccak256(h(k) . p)
where .
is concatenation and h(k)
is a function based on actual type of key.
If key is a value type, h(k)
applies a padding to k
to make it a 32 byte sized value.
If key is a string or byteX, h(k)
just computes keccak256
hash of unpadded data.
Remember that mappings here are different from C and other langs. While other langs store keys of a mapping and access values based on that internally, solidity lacks a concept of key. Wait wut?
Yeah, what solidity does is that it does not actually store the (key, value) pair but it stores keccak256(key), value
pair. Hence at low level even solidity does not know/have info on our keys. It just believes that when querying, whatever key we supplied will exist. When it doesn’t it returns zero.
Hence, you can’t delete a mapping as solidity does not implicitly store info on keys. We can delete a mapping when we have info on keys, which is manually reaching out to slot and making it 0 or deleting it.
Note : Sound any bells? The whole thing EVM is trying to achieve is to avoid collision as dynamic types are always expanding/reducing. That’s why dynamic types are hashed at p
th slot to get a storage far away from fixed types and other dynamic types.
Organisation of memory
:
- Memory is theoretically unlimited. Limited by the miner’s RAM but the EVM does not impose a restriction. But given the quadratically increasing gas costs as memory use increases, not unlimited tho.
- Solidity reserves 128 bytes for special operations.
Range | Size | Purpose |
---|---|---|
0x00 to 0x3f | 64 Bytes | Scratch space for Hashing |
0x40 to 0x5f | 32 Bytes | Free memory pointer (current allocated memory size) |
0x60 to 0x7f | 32 Bytes | Zero slot |
- The Zero slot is used as initial value for dynamic memory arrays. Recall that dynamic size arrays need the first slot to know the size and also to hash that slot to get actual array store (
p
th slot theory from storage). - So we should never write to the zero slot. But don’t worry, on a high level, the free memory pointer points to
0x80
intitally so unless we do something weird in assembly, we’re cool. - New objects are placed at memory pointed by free memory pointer. Allocated memory is Never Freed. This might change in future solidity releases.
- Elements in memory arrays always occupy multiples of 32 bytes. Highly Inefficient. So, while
uint8[4]
occupies 1 Byte in storage, it occupies 128 Bytes here cuz each element occupies 32 bytes regardless of how small it is. This applies to even structs too. So, in both memory and calldata, packing is absent. - The recent compiler versions from
0.8.13
use memory to prevent stack too deep errors by copying some stack variables onto memory and some optimisations. - This extra feature requires that the code block be memory safe. By default solidity code is considered memory safe, but the
inline assembly
blocks need to be declared explictly usingassembly ("memory-safe") { ... }
. The compiler does NOT check that it actually is memory safe but turns on the above functionality believing it is actually memory safe on the programmer’s promise. Declaring it as memory safe but not actually being memory safe leads to undefined behaviours .- **Q. What is a memory safe assembly block? **TL;DR, blocks that respect solidity memory layout.
- It’s memory safe when the following memory blocks are accessed :
- Memory allocated by yourself respecting solidity layout i.e reading from free memory pointer
0x40
and updating (incrementing) it correctly after any allocations. - Memory allocated by Solidity, e.g. memory within the bounds of a memory array you reference.
- The scratch space between memory offset 0 and 64 mentioned above.
- Temporary memory that is located after the value of the free memory pointer at the beginning of the assembly block, i.e. memory that is “allocated” at the free memory pointer without updating the free memory pointer.
- The assmebly block that doesn’t have any consecutive allocations. For example, if the assembly block is the last piece of code inside the function, it is safe by default as all memory is wiped after function call finishes.
- Memory allocated by yourself respecting solidity layout i.e reading from free memory pointer
bytes, string
vs byte1[]
:
When used in memory
:
In memory due to absence of packing, byte1[]
has the worst efficiency as each element is placed in it’s own slot i.e 1 byte is used in whole 32 byte slot. bytes
or string
on the other hand are not treated as arrays and retain the general tight packing even inside memory. Also in calldata
.
When used in storage
:
While bytes1[]
is stored in storage
based on general array packing rules,
the storage encoding of bytes
depends on the length. If the length of bytes
is at most 31 bytes, it is stored adjacent to the array length slot p
. If the length is more than 32 bytes, bytes
is treated in the same way as an array and it’s data is stored at
keccak256(p)
i.e hash of the inital length/array slot.