The Storage is one of the four data locations a solidity smart contract has (the others are : memory, calldata and stack). In simple words, it is the “database” associated to the smart contract, values are persisted after the transaction finishes, which is why it contains the contract’s “state” variables.
Storage definition
Each smart contract storage contains (theoretically) 2**256 slots that are 32 bytes long (technically an infinite amount of slots). A slot is kind of the “basic unit” of the storage, when reading or writing from/to the storage we have to deal with slots and not with individual bytes.
Variables visibility
Storage variables (also called “state” variables) can have the following visibility definitions:
- Public: when a state variable is set as “public”, solidity will automatically generate an external function to return its value (a “getter” function). The variable will be accessible from any smart contract inheriting from the smart contract where the variable is defined (derived contracts).
- Internal: internal variables are exactly like public variables except that solidity will not automatically generate the “getter” function. This is the default visibility.
- Private: private variables are NOT accessible from derived contracts.
It is important to note that storage variables can be read from off-chain applications independently of their visibility definition, “private” variables are not really private….
Storage layout
Variables can be stored on storage in different ways, depending on their data type, the order in which they were defined and sometimes even on their value.
Value types
Value types (uint256, address, bool, …) are stored in the order they are defined.
By default each variable takes one full slot, this can however be a little bit different if its data type is less than 32 bytes long and the previous/next variable is also less than 32 bytes long, in that case, if both variable can fit into a single slot (their combined length is less than or equal to 32 bytes long) solidity packs them.
Fixed size arrays are stored in a similar way, with the only peculiarity that they cannot be packed with any variable defined before or after.
For instance:
uint256 u1; // 32 bytes-long
uint256 u2; // 32 bytes-long
address a1; // 20 bytes-long
bool b1; // 1 bytes-long
address a2; // 20 bytes-long
bool[5] arr1; // 5 bytes-long
bool b2; // 1 bytes-long
u1 and u2 will take their own slots (slot 0 and slot 1).
a1 and b1 will be packed into a single slot, since their combined length is 21 bytes (slot 2)
a2 will take its own slot too, since it does not fit in the remaining 11 bytes from slot 3 (slot 3).
arr1 will start in a new slot because it is a fix-size array (even if technically it could fit in the previous slots remaining 12 bytes) and its values will be packed (slot 4).
b2 will start in a new slot (despite been only 1 byte, which could fit into the previous slot) because the previous variable was a fix-size array (slot 5).
Reference Types
Reference types are : dynamic-size arrays, mappings and structures.
- Dynamic-size arrays: They take their “own” slot (like a uint256 data type would) which contains the length of the array. The values of the array are stored in order starting at position keccak256(p) and can be packed together (where “p” is the array “own” slot position)
- Mappings: Stored in a similar way to dynamic-size arrays, only that the mapping “own” slot does not store anything (mappings do not have length as opposed to arrays) and values are stored at position keccak256(h(k) . p) (where “p” is the mapping “own” slot position, “k” is the mapping’s key we are accessing, “h” is a padding function for keys that are less than 32 bytes and “.” is a concatenation function). Mappings values are kind of scattered all over the storage, which is why packaging is not possible.
- Structures: Structures are stored in exactly the same way as fix-size arrays, they start on a new slot and the next variable after the struct starts on a new slot too. Variables inside the structure can be packed together.
Bytes & Strings
Bytes and Strings (which are basically fancy bytes) storage policy depends on their size.
- Less or equal to 31 bytes (short bytes/strings): length and value store in a single slot like:
Value: stored in the higher-order bytes (left aligned).
Length: stored in the lowest-order byte (rightmost byte) as length * 2.
- More or equal to 32 bytes (long bytes/strings): stored in a very similar way to dynamic-size arrays, with the only difference that the “own” slot does not store the length but : 2 * length + 1 (This is done to make the difference between short and long bytes/strings, by simply looking at the last bit of the byte/string own slot, if bit = 0 then SHORT, if bit = 1 then LONG)
Inheritance
The solidity compiler accepts multiple inheritance and uses the C3 linearization algorithm (beyond the scope of this blog) to determine the final “linear” hierarchy of parent smart contracts.
State variables defined in parent smart contracts get inherited by their children to form the final storage layout, the order of those variable is determined by the C3 linearization algorithm result.
contract parent_1
{
uint256 p1_u;
address p1_a;
}
contract parent_2
{
uint256 p2_u;
address p2_a;
}
contract child is parent_1, parent_2
{
uint256 u;
address a;
}
// FINAL STORAGE LAYOUT FOR "child":
uint256 p1_u; // Slot 0
address p1_a; // Slot 1
uint256 p2_u; // Slot 2
address p2_a; // Slot 3
uint256 u; // Slot 4
address a; // Slot 5
This is very important to keep in mind when upgrading smart contracts (using transparent proxies or any other pattern) because if we add state variables to parent contracts, or new contracts with state variables to the inheritance, those new state variables might “shift” down the previous ones which may lead to undesired and unexpected consequences.
contract parent_3
{
uint256 p3_u;
address p3_a;
}
contract child is parent_1, parent_2, parent-3
{
uint256 u;
address a;
}
// FINAL STORAGE LAYOUT FOR "child":
uint256 p1_u; // Slot 0
address p1_a; // Slot 1
uint256 p2_u; // Slot 2
address p2_a; // Slot 3
uint256 p3_u; // Slot 4 : p3_u will contain the value u had
address p3_a; // Slot 5 : p3_a will contain the value a had
uint256 u; // Slot 6 : u will be 0
address a; // Slot 7 : a will be 0x
Gas cost
Saving data in storage means saving data on the blockchain forever (or until you remove it) which is why dealing with storage in ethereum is very expensive in terms of gas.
Removing data from storage on the other hand allows for some transaction gas to be refunded, this is done to encourage developers to “release” storage that is not needed anymore.
Another important Gas related policy to keep in mind when dealing with storage is the concept of “cold and warm” accesses. Since EIP-2929, the EVM makes the difference between the first time we access a storage variable within a transaction (cold access, it does not matter if it is a read or write access) and the rest (warm access):
- Reading from storage (SLOAD opcode only) : Cold read costs 2'100 gas, Warm read costs 100 gas.
- Writing to storage (SSTORE opcode only): If we are setting a variable from 0 to a non-zero value it will cost 22'100 gas if it is a cold write or 20'000 if it is a warm write. If we are changing a previously set variable value, it will cost 5'000 for a cold write and 2'900 for a warm write. If he value we are writing is the same one the variable already has, then the gas cost will only be 2'200 for a cold write and 100 for a warm write.
- Refunds: If we reset a storage variable to 0, we can get a 20% transaction gas refund (or up to 4'800 gas per reset variable).
/* Disclaimer : this code is only to illustrate the gas costs,
it does not make much sense for itself... */
uint256 _u1;
function gasCostRead() external returns (uint256)
{
uint256 uCold = _u1; // Cold Read : Sload = 2'100 gas
uint256 uWarm = _u1; // Warm Read : Sload = 100 gas
retun u1;
}
function gasCostWrite() external
{
_u1 = 3; /* Cold Write : Sstore = 22'100 gas (if _u1 was 0),
5'000 gas (if _u1 was neither 0 nor 3
2'200 (if _u1 was 3) */
_u1 = 4; // Warm Write : Sstore = 2'900 gas because _u1 was 3
_u1 = 4; // Warm Write : Sstore = 100 gas because _u1 was 4 already
_u1 = 0; // Warm Write : Sstore = 2'900 gas because _u1 was 4 + 20% refund (max 4'800 gas)
_u1 = 3; // Warm Write : Sstore = 20'000 gas because _u1 was 0
}
Best Practices
When dealing with storage variables
- Secure writing access: Be sure who has the right to change each and every storage variable since this will affect your contract’s state…
- Store as little as possible: Storing data in storage is pretty expensive as we saw earlier. In order to make your contract as cheap as possible, only save in it information that can not be stored anywhere else.
- Define your storage layout pattern in advance: Upgrading smart contracts can lead to storage variables overlapping, which can render your contracts useless. Before deploying even the first version, be sure you are using a layout pattern that will give you the possibility to painlessly upgrade it in the future (check: eternal storage, unstructured storage, gap pattern, …)
- Access data as little as possible: Accessing storage variables is also expensive despite cold reads been relatively cheap. As a general rule of thumb, if you are going to read the same storage variable multiple times within the same transaction (and NOT modify it), just copy it into a local variable and read it from there.
- Packing variables: Defining variables that are less than 32 bytes next to each other can lead to solidity packaging them together, which can be a good idea if your are using them in your code at the same time (within the same transaction) because it will help you save gas. HOWEVER, if these variables are completely “unrelated”, using them can turn out to be even more expensive than if they were not packed together!! This happens because solidity works with 32 bytes words, and in order to extract the variable it needs from a packed slot, it will have to run some extra tasks.
- Set to zero whenever it makes sense/possible: We saw earlier that resetting a storage variable to zero will entitle the user that submitted the transaction to a refund, which means a lower gas cost. It can thus be a good idea to reset variables if it makes sense from a business point of view. HOWEVER resetting to many variables can actually lead to the opposite result, this happens because if the refund is already at 20%, another extra reset will only be able to refund 20% of the cost added by the reset operation itself!! which means that the total cost of the transaction will actually increase by 80% of the cost added by the reset operation…..
All Comments