Danksharding is an approach to scaling the amount of data on chain for a future version of Ethereum. The goal of this upgrade is to ensure that the data on chain was made available to archiving parties when it was first posted. It does so using a technique called Data Availability Sampling, or simply DAS.
In this post we survey how data availability in danksharding works and propose some modifications to the underlying technique. In particular, we explore a small modification that may improve data recovery: the current proposal requires 75% of the shares to recover a block, whereas the modification may reduce this bound to 25%.
Protodanksharding
Danksharding is scheduled to follow protodanksharding (EIP-4844). Protodanksharding will allow clients to write more data to the blockchain by introducing a new transaction type called a "blob-carrying transaction." Initially, this new transaction type will carry with it a list of up to four data blobs, where each blob is at most 128 KB (4096 field elements of 32 bytes each), adding up to 512 KB additional data per block (targeting 256 KB on average), compared to current Ethereum’s block size, which is around 100 KB on average.
These data blobs will be handled differently: (i) they will only be stored for a limited time, for example, 30-60 days, and (ii) the data will not be directly accessible to smart contracts despite being part of the transaction data. Instead, only a short commitment to the blob data, called DATAHASH, will be available to smart contracts. The additional burden on the validators seems tolerable: validators currently store less than 100 GB to maintain the blockchain’s state. After protodanksharding they will have to store an additional 50-100 GB.
Danksharding will follow. It will increase the data available to clients by 60x over protodanksharding by increasing the maximum number of blobs per block. Blocks will grow from 0.5 MB per block to 30 MB. But because validators cannot be made to store 60x more data, the data will be dispersed among them such that each validator will store only a small subset of data. Yet they can come to agreement on whether they collectively store all of the data through a data-availability sampling (DAS) protocol.
The blob-data will be priced via a mechanism similar to EIP-1559, and will target about 1 data-gas per byte. Calldata, which is the current cheapest alternative, is priced at 16 gas per byte. But since there will be two different fee markets, these costs are not directly comparable. Roll-up clients will benefit from those upgrades, because currently more than 90% of their client’s fees go to paying Ethereum data fees.
Other projects, such as Celestia and EigenLayer, are also adopting DAS techniques to increase the available data space. These designs are much simpler than fully sharding the Ethereum network.
The goal of data availability sampling
We describe the scheme assuming a proposer-builder separation (PBS) design:
- Clients submit their blob-carrying transactions to block builders.
- A block builder forms a block B by selecting N client data-blobs. Data-blob number i comes with a short commitment Ci signed by the client that sent it. Let C = (C1, . . . , CN) be the list of all N signed commitments in the block B.
- Block builders submit their proposed blocks to the current block proposer (one of the validators). The block proposer selects one of the blocks and posts it to the network as is.
The challenge is to ensure that the block B can be reconstructed at a later time. To do so, the builder replicates the block across a large network of V validators. One could ask each and every validator to store the entire block, but that is considered too expensive. Instead, the block builder
- encodes the block B into a larger block E using erasure coding;
- breaks the block E into V overlapping fragments P1, . . . , PV;
- sends to validator number i the pair (Pi, C).
Every validator checks that the fragment Pi that it received is consistent with the list C of signed commitments. The block builder provides proofs for the validators to facilitate these checks.
With this setup, a Data Availability Sampling scheme has two protocols:
- A sampling protocol runs between a sampling verifier and the validator set. The sampling verifier takes the list C as input and requests randomly selected elements of the block E from the validator set. The sampling verifier outputs success if it received all the requested elements and all are consistent with C.
- A reconstruction protocol runs between a reconstruction agent and the validator set. The reconstruction agent takes C as input and requests elements of block E from the validator set. Once it collects more than 75% of the elements, and all are valid, the reconstruction agent computes and outputs the block B.
(We discuss an approach below that may reduce the number of elements needed for reconstruction.)
The requirement is that if the sampling verifier outputs success, then the reconstruction agent will output the block B, provided it receives over three quarters of the elements as input. Reconstruction should succeed as long as enough elements are provided, even if the provided elements are selected adversarially.
To recap, the following parties participate in danksharding:
- Client: sends data blobs (which are either transactions or bundles) to a builder.
- Builder: makes a block and sends fragments of this block to validators.
- Block proposer (one of the validators): posts the block to the network.
- Sampling verifier (any of the validators): runs the sampling protocol and signs the block header if the protocol outputs success.
- Reconstruction agent: reconstructs a previously posted block when needed by interacting with the entire validator set. Reconstruction succeeds if the validators respond with more than three-quarters of valid elements.
Erasure coding and polynomial commitments
There are two building blocks to the scheme that we explain next: erasure coding and polynomial commitments.
Building block #1: Erasure-coding
Erasure coding dates back to the 1960s, motivated by the need to transmit information over a lossy channel. It is used in danksharding to protect against validators that lose their data fragments. The technique expands the data from N elements to M elements (M > N) such that the original data can be reconstructed from any intact N out of M elements of the expanded data. Imagine taking N elements (original data), encoding them into M = 2N elements, and giving one encoded element to each of the 2N validators. If a majority of validators are honest, they can always jointly reconstruct the original data. This technique protects against crash faults of any half of the validators. It can be extended to protect against byzantine behavior of a half of the validators using polynomial commitments discussed in the next section.
Here is how expansion works in more detail. To expand the data from N field elements d1, d2, . . . , dN ∈ 𝔽p to M encoded elements e1, e2, . . . , eM ∈ 𝔽p, and we interpolate the unique degree N – 1 polynomial p(x) that satisfies p(i) = di for i = 1, . . . , N. The polynomial is then evaluated at M points: ei = (i,p(i)) for i = 1,. . . , M. The original polynomial, p(x), can be reconstructed from any N out of the M encoded elements using Lagrange interpolation. This expansion method is called Reed-Solomon encoding or low-degree polynomial expansion.
Building block #2: Polynomial commitments
Polynomial commitment is a building lock of a DAS scheme. It allows the scheme to do two things:
- commit to a univariate polynomial p in 𝔽p[X] of bounded degree D using a short commitment string, C.
- open the committed polynomial at a given point x ∈ 𝔽p.
More precisely, given x,y ∈ 𝔽p, the committer can create a short proof π that the committed polynomial p satisfies p(x) = y and that its degree is at most D. The proof π is verified with respect to the commitment string C and the values x, y. This π is called an evaluation proof.
Security guarantees that the commitment is binding to the polynomial and that an adversary cannot create a false proof.
A number of practical polynomial commitment schemes can be used here. Danksharding uses the KZG commitment scheme that requires a trusted setup ceremony to generate public parameters (called an SRS) but has constant-size commitments and constant-size evaluation proofs. KZG commitments have a number of properties that make them especially well suited for danksharding:
- the commitment is homomorphic: if C1 is a commitment to p1 and C2 is a commitment to p2, then C1 + C2 is a commitment to p1 + p2;
- the commitment is unique: if two people independently compute a commitment to a polynomial p, they obtain the same commitment;
- evaluation proofs are homomorphic: for a given x, if π1 is a proof that p1(x) = y1 and π2 is a proof that p2(x) = y2, then π1 + π2 is a proof that (p1 + p2)(x) = y1 + y2.
We can now explain how a client commits to its data-blob. First, the client interprets the data-blob as a vector of m field elements d1, . . . , dm ∈ 𝔽p, for m ≤ 4096. Next, it interpolates a univariate polynomial p ∈ 𝔽p[X] of degree at most m – 1 such that p(i) = di for i = 1, . . . , m. (Technically, danksharding uses 1, ω, ω2, . . . , ωm-1 ∈ 𝔽p as the evaluation points, where ω ∈ 𝔽p is an mth root of unity, and a reverse-bit order; this is done for efficiency, but we will not consider it here for simplicity.) Finally, the client constructs the KZG polynomial commitment to the polynomial p. It signs the commitment and sends the commitment-signature pair to the builder. This process requires public parameters (an SRS) containing 4096 group elements.
Data Dispersal
Next, we explain how the block builder encodes a block and splits it into fragments to send to the validators. Fix some 256-bit prime p. The block builder does the following:
input: A block B in danksharding can contain up to 256 data blobs (64 times more blobs than in protodanksharding), where each data blob is a vector of 4096 elements in 𝔽p. Thus, we can represent a block as a 256 × 4096 matrix of elements in 𝔽p. Each row in this matrix corresponds to one data-blob from a client. Recall that each client sends to the builder one row of B along with a signed KZG polynomial commitment for that row. The builder collects 256 signed polynomial commitments C0, . . . , C255, one commitment per row.
step 1: The builder interpolates a bivariate polynomial d(X,Y) such that d(i,j) = B[i,j] for i = 0, . . . , 255 and j = 0, . . . , 4095. This bivariate polynomial has degree at most 255 in X and degree at most 4095 in Y.
step 2: The builder uses the erasure coding method described above to expand the block by a factor of two in each direction. That is, it forms a 512 × 8192 matrix E of field elements by setting E[i,j] ← d(i,j) for i = 0, . . . , 511 and j = 0, . . . , 8191. This is illustrated in the following figure.
step 3: The builder verifies that each signed Ci is a KZG commitment to the univariate polynomial di(Y) := d(i,Y), for all i = 0, . . . , 255. Observe that the polynomial di(Y) is an interpolation of row i of B, and therefore must be the same as the polynomial committed to by client i. The builder rejects all data-blobs for which Ci is malformed.
Now the builder uses C = (C0, . . . , C255) as a commitment to the block B, or more precisely, to the bivariate polynomial d(X,Y).
Let’s show that C = (C0, . . . , C255) is indeed a polynomial commitment to the polynomial d(X,Y). For some given x,y,z ∈ 𝔽p, let us construct an evaluation proof that convinces the verifier that d(x,y) = z with respect to the commitment C. Since the degree of X in d(X,Y) is at most 255, there are constants λ0, . . . , λ255 ∈ 𝔽p that depend only on x such that
d(x,Y) = λ0 · d(0,Y) + . . . + λ255 · d(255,Y).
Then, by the homomorphic property of KZG commitments it follows that
Cx := λ0 · C0 + . . . + λ255 · C255
is a KZG commitment to the univariate polynomial dx(Y) := d(x,Y). Hence, the verifier can construct Cx from C on its own. Let π be the KZG evaluation proof for the polynomial dx(Y) that convinces the verifier that dx(y) = z with respect to the commitment Cx. This π is the required evaluation proof for d(X,Y) at the point (x,y).
This argument shows that C is a polynomial commitment to d(X,Y). It is worth noting that while each client signs a row of B independently of the other clients, the collection of all client signatures functions as a signature on the polynomial commitment to d(X,Y).
step 4: The minimum unit of communication in this DAS scheme is a sample, which is a 16-elements row vector. The matrix E with 512 × 8192 elements is treated as a square matrix of 512 × 512 samples. Let V be the number of validators. Then the block builder breaks the matrix E into V overlapping fragments P1, . . . , PV, where Pi comprises exactly two rows and two columns of samples of E, chosen at random from the 512 rows and 512 columns of samples. Thus, Pi contains 2 × 512 × 16 + 2 × 8192 = 9216 field elements in 𝔽p. This is much smaller than the full block B, which is about a million field elements. The minimum number of validators in Ethereum is 128 (and currently around ~500,000), so enough validators exist to ensure that the whole block is sufficiently well covered.
step 5: The block builder sends the triple (Pi, C, πi) to validator number i, where πi is a list of evaluation proofs for all the elements in Pi: one proof for each cell d(x,y) in the two rows and two columns of samples in Pi. The proofs in πi enable the validator to verify that every field element in Pi is consistent with the commitment C.
In danksharding the number of validators can greatly exceed the number of columns or rows in the block. Therefore, some columns and rows might be stored by multiple validators. As such, danksharding uses both replication and Reed-Solomon erasure coding to ensure that the data can be reconstructed.
Full block’s reconstruction
When a reconstruction agent needs to reconstruct the entire block, it has C, and asks the validator set to send it their fragments. In response, an honest validator i sends (Pi, πi). The reconstruction agent checks the opening proofs in πi, and if valid, it accepts the values in Pi. Hence, byzantine validators cannot send false data. However, Byzantine validators may refuse to send their data and not respond to the reconstruction agent.
When some data is missing, danksharding needs at least 75% of the matrix E to reconstruct the block. Since the validators only store complete rows and columns, the worst-case scenario is the following one where 75% – ε of the block’s elements are present, yet it is impossible to reconstruct the missing elements.
To see why the data cannot be reconstructed in this case, observe that there is a non-zero bivariate polynomial δ(X,Y) that evaluates to zero on the entire "present" region and has the required degree bounds on X and Y. Therefore, during reconstruction it is not possible to tell if the missing white area comes from the correct polynomial d(X,Y) or the polynomial d(X,Y) + δ(X,Y); both polynomials agree on the present data. As a result, when the missing data follows this pattern (up to a permutation of rows and columns) the missing data cannot be reconstructed. Note that if a validator drops some of its data, thereby becoming "byzantine," it drops it all, both rows and both columns.
So less than 75% of the block is not enough in the worst case. However when 75% or more of the block is present the reconstruction is guaranteed to succeed by a simple greedy algorithm.
The reconstruction algorithm: Reconstruction works by iteratively finding an incomplete row (or a column) with at least 50% of available elements in it and using univariate interpolation to reconstruct the missing elements of the row (or column). To see why this procedure eventually reconstructs the whole block, let"s assume the block is still missing some elements, yet we can find no row or column that can be reconstructed. This means that each row and column either has <50% available elements or it is full (has 100% of available elements). Let"s select any incomplete row; it has >50% of unavailable elements, the columns through those elements also have to have >50% unavailable elements, which immediately implies that >25% of the block is unavailable, which contradicts the original assumption.
To render one block unreconstructable, the adversary needs to attack at least 15/16 of the validator set. The attackers in this case would pick a quadrant to erase, and attack those validators that have at least one of their two rows or one of their two columns in that quadrant.
Local reconstruction of validator’s data
Suppose that an honest validator i crashes and loses its data. When it comes back online it wants to reconstruct its two rows and two columns Pi as well as the opening proofs πi. Danksharding enables a validator to do that without reconstructing the entire block. The validator can download its data from another validator that stores the same exact set of points, or by obtaining 50% of the elements of its row (or column) from other validators, and reconstructing the rest of the row through univariate interpolation. This process does not have to go through full reconstruction.
For example, imagine a validator stores at least 50% of column number 42, but less than 100% and it needs to reconstruct the missing elements. That is, the validator holds pairs (E(i,42), πi) ∈ (𝔽, 𝔾) for all i ∈ S, for a set S where 256 ≤ |S| < 512. To reconstruct the missing pairs, the validator does the following:
- Using Lagrange interpolation over the field 𝔽 constructs a polynomial p(x) of degree 255 s.t. p(i) = E(i,42) for all i ∈ S.
- Evaluates the polynomial to obtain the missing elements: E(i,42) := p(i) for all i ∈ {0..511}\S. Note that the points are guaranteed to lie on a 255-degree polynomial because the validator has checked them against the commitments C.
- Obtain the missing proofs via a multi-exponentiation by doing polynomial interpolation "in the exponent": for all i in {0..511}\S compute πi := ∑j∈S πj · Λj,S(i), where Λj,S(i) is the Lagrange coefficient Λj,S(i) := ∏k∈S,k≠j(i – k)/(j – k).
Step 3 uses the fact that KZG polynomial commitments have homomorphic proofs. To the best of our knowledge, KZG is the only polynomial commitment scheme with this property.
Sampling for data availability in danksharding
To determine whether enough of the expanded block E is available for reconstructing the original block B, the sampling client queries for random samples of E. In response to each query, it gets a sample (16-elements row vector) and an evaluation proof to check the sample against the commitment C. If all its queries are successfully answered, the client accepts the data as available. If the client makes a total of Q queries, then it falsely accepts unavailable data with probability (3/4)Q which drops exponentially quickly with the number of queries Q. The rate 3/4 corresponds to the reconstruction threshold of 75% from the previous section.
In the Ethereum network, sampling will be done by three types of parties: (i) full-nodes who can’t afford to store the blob-data in full, (ii) light clients, and (iii) the validators themselves. The validators have to certify that the data is available before casting any votes on a block and its predecessors. Each epoch has a fixed set of validators, which are split into 32 committees randomly. Each committee is assigned a 12 seconds slot of the epoch (there are 32 slots per epoch). One block is expected to be confirmed per slot. Each validator receives the fragments (2 rows and 2 columns of samples) of each of the blocks, even for blocks when the validator is not part of the attesting committee. In addition, each validator samples the current block and all the previous blocks to ensure that all the blocks are both valid and available before selecting a block for attestation according to a fork-choice rule.
The protocol guarantees that the data will be available for an epoch of 32 slots, that is, for 6 minutes. Additional approaches such as proofs of retrievability or incentive mechanisms can help ensure the data is made available for longer periods of time, e.g., 30-60 days when the blobs should be available before expiring.
A proposal for 25% reconstruction
In this section we explain how to make reconstruction succeed if only 25% of the data is available (compared to 75% as in current danksharding explained above).
When the fraction of available points is below 75%, then the greedy column-wise and row-wise univariate interpolation can fail. Instead, we propose to do reconstruction directly through bivariate polynomial interpolation. In this case, full reconstruction is possible even if only 25% of the points of E are available. However, this requires a small modification to the way that fragments are assigned to validators. Instead of giving each validator two complete rows and two complete columns of samples, we propose that each validator is assigned to store
- one complete row (512 samples), one complete column (512 samples), and
- a collection of 1024 samples randomly (or pseudorandomly) spread around the matrix E.
The overall storage requirement per validator is unchanged, but now reconstruction is possible even if fewer than 75% of the samples are available. Moreover, multiple validators store the same set of points, so that if any validator needs to reconstruct its fragment, it can request it from any other validator that stores the exact same fragment. If no such validator is available, it can do local or full reconstruction to obtain its fragment.
This hybrid proposal allows cheap reconstruction in the happy-case when the number of byzantine validators is small, so that more than 75% of the samples of E are available. In the unhappy case, when the number of byzantine validators becomes too high, the data can be reconstructed using full-bivariate interpolation from only 25% of the samples thanks to the random dispersion of samples throughout the matrix E.
Bivariate interpolation can naively be done by solving a linear system of equations on the coefficients of the polynomial. This simple approach requires constructing an interpolation matrix with 220 × 220 field elements (32TB). This is quite large but not infeasible. However, there are better methods that can be used. (See for example the survey from P.J.Olver-2016.) While the cost of bivariate interpolation is non-trivial, it is only needed when the fraction of recovered points is below 75%, and can be treated as a safety measure. To enable this safety measure, danksharding needs to be slightly modified to assign fragments to validators as outlined above.
Danksharding with IPA commitments
The main drawback of the previous construction is the need for a trusted setup for the KZG polynomial commitment scheme. Otherwise the scheme is blazingly fast. However, the trusted setup typically needs a lot of work and coordination. (Our work on on-chain KZG setups can help simplify the ceremony for future projects). Some other polynomial commitment schemes do not need a trusted setup (e.g., Bulletproofs), although they do not have homomorphic proofs required for the efficiency of validators when reconstructing the data they need to store (as pointed out by Vitalik).
The construction can be altered, however, to avoid the need for homomorphic proofs and still have light-weight validators. The high-level idea is to make block builders compute commitments to the columns of the block-matrix E. With this approach, the validators won’t need to reconstruct the column-proofs; they will simply reconstruct their columns in full themselves and recompute the proofs against the column commitment from scratch. Through consensus, the validators would make sure they agree on the same set of columns-commitments.
More precisely, the steps 1-4 are the same as in danksharding explained above. Step 5, however, is different.
step 5: The block builder computes polynomial-commitments to the columns of B: denote them by T = (T0, T1, . . . , T4095), each Tj is a commitment to d(X, j) (concretely it could be Pedersen vector commitment to the vector of coefficients of d(X, j)). Next, it creates a proof that C and T commit to the same matrix as follows. The block’s builder chooses a (pseudo)random point (x̂, ŷ), uses the homomorphism of the commitments to interpolate and compute the column commitment Tŷ to a univariate polynomial d(X, ŷ), and a row commitment Cx̂ to a univariate polynomial d(x̂, Y) and create two proofs: πx̂ – the polynomial evaluation proof of d(x̂, ŷ) against Cx̂, and πŷ – the polynomial evaluation proof of d(x̂, ŷ) against Tŷ. The point (x̂, ŷ) is universal for all of the validators, and can be obtained for example from a random beacon output or generated pseudo-randomly with a Fiat-Shamir transform. The block’s builder then sends (Pi, C, T, πx̂, πŷ, d(x̂, ŷ)) to validator number i, where Pi is the two rows and two columns of E. The builder computes no proofs in this construction.
The validator verifies the proofs πx̂, πŷ to catch a malicious block"s builder who generates the column commitments T incorrectly. The validator then recommits to two rows and columns in Pi and verifies that those match the corresponding elements in C and T. If the row, denote x, in Pi is within range of the original matrix, B: x ∈ {0..255}, the validator simply verifies that the commitment matches the corresponding element Cx from C; however if the row is in the expanded portion: x ∈ 256..511, the validator first interpolates through the commitments in C to obtain Cx:
Cx := λ0 · C0 + . . . + λ255 · C255
Note that interpolation is possible because the IPA commitments (not their proofs though) are homomorphic. The validator verifies the columns of Pi in a similar way against T, interpolating if needed.
In this construction, the block’s builder does not have to compute the proofs for the validators. The validators can compute all the proofs themselves. It’s an interesting research problem, however, to figure out an efficient algorithm to compute a large number of proofs at once (similar to how it’s done for KZG, or using Vitalik’s ideas for IPA).
The main bottleneck of this approach, as we see it, is the loss in efficiency for sampling clients, as the proofs for IPA-based schemes are logarithmic in size (in contrast to constant-size for KZG). Moreover, the sampling clients might need to download column commitments in addition to the row commitments, incurring an overhead in communication. However, we believe this to be a promising direction for further exploration.
Danksharding with Merkle commitments
Another plausible approach Vitalik explored recently to avoid the trusted setup is to use Merkle commitments instead of polynomial commitments. Merkle commitment is not a polynomial commitment scheme, so it’s more challenging to make it work with erasure-coding. The block’s builder will erasure code the data and commit to the expanded data with Merkle-tree roots. The main challenge is to detect incorrectly expanded data without downloading the full data. Fraud proofs can be used to get around this issue, but that relies on the existence of a client that would download the full data, check that it was erasure coded correctly and correctly committed to, and would raise an alarm if it is not by providing a fraud proof.
Alternatively, a FRI proof can be used to check that the leaves of the Merkle tree are close to a Reed-Solomon codeword (i.e., check that the data underlying the Merkle commitment was erasure-coded correctly). The sampling clients would check the FRI proof and sample a sufficient fraction of the leaves by requesting their Merkle proofs to ensure the data is available and can be reconstructed.
***
Data availability sampling, and danksharding as one of its concrete instantiations, would bring the cost of data-storage down, leading to more scalable and cheaper blockchains. The coding aspects of DAS is a potentially rich area for research with many possible directions to explore. We suggested one of the possible avenues: improving the reconstruction protocol to use fewer samples (25% instead of 75%). Another exciting direction is to explore alternative commitment schemes that do not require a trusted setup.
All Comments