Solidity Internals

← Back to Homepage

Internals : This doc explains the internals of data locations. While General Notes mentions some differences, this doc delves into more explanation.

Internally, data locations are of 4 types.

storage: The actual store on the blockchain. State variables here persist after any execution. Highest gas cost to manipulate.
Memory is data during execution. It can only be created inside functions as local variables. Miner’s RAM/Stack is where memory data lives. After each function call completes it is wiped. Costs less gas than storage
calldata is the input data generated by the user and sent as a transaction. All the parameters the user intended to give the contract are located in the calldata. It’s the data field in a transaction.
stack is the function stack and is also the default location to store value types inside a function.

Organisation of `storage` :

EVM operates in 32 byte slots.
All value types like uint8, bool, address and so on are tightly packed.
So, uint128 and uint128 are packed side by side in one 32 byte slot.
Any element not fitting in left over space is moved onto a new slot
So, uint128 and then uint256 take up 2 slots instead of one. Because uint256 can’t fit in first slot, a new second 32 byte slot is used.
Arrays and structs always start in a new slot. Any individual elements can still occupy left over space though after array/struct ends according to above rules .
Note that, Dynamic arrays and mappings are NOT stored in the middle of normal fixed size state variables.
Now assume that all fixed size elements are stored and now we have dynamic arrays and mappings left to allocate.

Dynamic Arrays : After allocating fixed types as above, we are now on pth 32 byte slot. For dynamic arrays, this slot contains number of elements in this dynamic arrays i.e length. p here is the pointer/memory address value. Actual array data is stored at keccak256(p). That is, while location p contains number of elements of dynamic array, hash of p is where the array actually starts. Now elements in array itself are stored as general above rules. Fit in?pack em. Can’t fit? new slot. Check out byte1[] vs bytes and string below.
Mappings : ** After allocation of fixed types, let’s say we ended up on pth slot. This slot **stays empty. Let’s say we want to access element with k as key. The formula is

keccak256(h(k) . p) where . is concatenation and h(k) is a function based on actual type of key.

If key is a value type, h(k) applies a padding to k to make it a 32 byte sized value.

If key is a string or byteX, h(k)just computes keccak256 hash of unpadded data.

Remember that mappings here are different from C and other langs. While other langs store keys of a mapping and access values based on that internally, solidity lacks a concept of key. Wait wut?

Yeah, what solidity does is that it does not actually store the (key, value) pair but it stores keccak256(key), value pair. Hence at low level even solidity does not know/have info on our keys. It just believes that when querying, whatever key we supplied will exist. When it doesn’t it returns zero.

Hence, you can’t delete a mapping as solidity does not implicitly store info on keys. We can delete a mapping when we have info on keys, which is manually reaching out to slot and making it 0 or deleting it.

Note : Sound any bells? The whole thing EVM is trying to achieve is to avoid collision as dynamic types are always expanding/reducing. That’s why dynamic types are hashed at pth slot to get a storage far away from fixed types and other dynamic types.

Organisation of `memory` :

Memory is theoretically unlimited. Limited by the miner’s RAM but the EVM does not impose a restriction. But given the quadratically increasing gas costs as memory use increases, not unlimited tho.
Solidity reserves 128 bytes for special operations.

Range	Size	Purpose
0x00 to 0x3f	64 Bytes	Scratch space for Hashing
0x40 to 0x5f	32 Bytes	Free memory pointer (current allocated memory size)
0x60 to 0x7f	32 Bytes	Zero slot

The Zero slot is used as initial value for dynamic memory arrays. Recall that dynamic size arrays need the first slot to know the size and also to hash that slot to get actual array store (pth slot theory from storage).
So we should never write to the zero slot. But don’t worry, on a high level, the free memory pointer points to 0x80 intitally so unless we do something weird in assembly, we’re cool.
New objects are placed at memory pointed by free memory pointer. Allocated memory is Never Freed. This might change in future solidity releases.
Elements in memory arrays always occupy multiples of 32 bytes. Highly Inefficient. So, while uint8[4] occupies 1 Byte in storage, it occupies 128 Bytes here cuz each element occupies 32 bytes regardless of how small it is. This applies to even structs too. So, in both memory and calldata, packing is absent.
The recent compiler versions from 0.8.13 use memory to prevent stack too deep errors by copying some stack variables onto memory and some optimisations.
This extra feature requires that the code block be memory safe. By default solidity code is considered memory safe, but the inline assembly blocks need to be declared explictly using assembly ("memory-safe") { ... }. The compiler does NOT check that it actually is memory safe but turns on the above functionality believing it is actually memory safe on the programmer’s promise. Declaring it as memory safe but not actually being memory safe leads to undefined behaviours .
- **Q. What is a memory safe assembly block? **TL;DR, blocks that respect solidity memory layout.
- It’s memory safe when the following memory blocks are accessed :
  1. Memory allocated by yourself respecting solidity layout i.e reading from free memory pointer 0x40 and updating (incrementing) it correctly after any allocations.
  2. Memory allocated by Solidity, e.g. memory within the bounds of a memory array you reference.
  3. The scratch space between memory offset 0 and 64 mentioned above.
  4. Temporary memory that is located after the value of the free memory pointer at the beginning of the assembly block, i.e. memory that is “allocated” at the free memory pointer without updating the free memory pointer.
  5. The assmebly block that doesn’t have any consecutive allocations. For example, if the assembly block is the last piece of code inside the function, it is safe by default as all memory is wiped after function call finishes.

`bytes, string` vs `byte1[]`:

When used in `memory`:

In memory due to absence of packing, byte1[] has the worst efficiency as each element is placed in it’s own slot i.e 1 byte is used in whole 32 byte slot. bytes or string on the other hand are not treated as arrays and retain the general tight packing even inside memory. Also in calldata.

When used in `storage`:

While bytes1[] is stored in storage based on general array packing rules, the storage encoding of bytes depends on the length. If the length of bytes is at most 31 bytes, it is stored adjacent to the array length slot p. If the length is more than 32 bytes, bytes is treated in the same way as an array and it’s data is stored at keccak256(p) i.e hash of the inital length/array slot.

Internally, data locations are of 4 types.

Organisation of storage :

Organisation of memory :

bytes, string vs byte1[]:

When used in memory:

When used in storage:

Organisation of `storage` :

Organisation of `memory` :

`bytes, string` vs `byte1[]`:

When used in `memory`:

When used in `storage`: