by Kurt Pan
Vitalik recently wrote a great post on the trade-offs about replacing the KZG commitment to an arithmetic hash-based one in EIP-4844. The article is very instructive and forward-looking, but unless you are tech-savvy and keep an eye on the state-of-art progress, there will be a lot of prerequisites to be met to understand the article.
In this short article, I will present to you some of the necessities and learning materials for understanding Vitalik’s post. The purpose is not to be mathematically or cryptographically rigorous or complete, it is mainly to serve as a preview of the landscape.
I hope this article will make the learning process a little bit easier for you.
Mathematics
Mathematics (for most people) is hard, especially the ones involved in blockchain frontier technologies, which to many seem like moon math. But at the same time, math is necessary for the clarification of concepts, formal description and analysis, security proofs, and many other necessary steps of research.
I will briefly describe some of the mathematical concepts below in informal language, for details please refer to:
- The MoonMath Manual by Least Authority
Finite fields
A field is a set on which addition, subtraction, multiplication, and division are defined and behave as the corresponding operations on rational and real numbers do.
A finite field (or Galois field) is a field that contains a finite number of elements. The most common examples of finite fields are given by the integers mod pp when pp is a prime number.
Elliptic curves
Elliptic curves are geometric objects in projective planes over a given field, made up of points that satisfy certain equations.One of their key features from the point of view of cryptography is that, if the underlying field is of positive characteristic, elliptic curves are finite, cyclic groups.Further, it is believed that, in this case, the Discrete Logarithm Problem on many elliptic curve groups is hard, given that the underlying characteristic is large enough.
Although a bit old, this is still a good introductory article on elliptic curves:
If you are a visual learner, also check out these tools for visualizing elliptic curves to help you understand:
Pairings
An elliptic curve pairing is a function that takes a pair of points on an elliptic curve and maps them to an element of some other group, called the target group.This mapping has a nice property called bilinearity that can be very useful in cryptography, which is similar to the multiplication of points in the group of elliptic curves.
A pairing-friendly curve is a special class of elliptic curves, which have a pairing map attached to them.
Pairings have a wide range of applications in cryptography and blockchain technology: both in zk-SNARKs proof systems and KZG polynomial commitment schemes.
Here are some excellent learning materials on pairings:
- Exploring Elliptic Curve Pairings by Vitalik Buterin
- BLS12–381 For The Rest Of Us by Ben Edgington
Lagrange Polynomial Interpolation
We all learned in high school that a line can be uniquely determined past two points. Similarly, nn points can uniquely determine a polynomial of order n−1n−1. Lagrange interpolation is the method used to find the expression of such polynomial.
It is widely used in tons of techniques such as Reed-Soloman error-correcting code, polynomial commitment schemes, zk-SNARKs, data availability sampling, even in secret sharing, threshold signature schemes, and many others.
Polynomial Commitment Schemes
In a commitment scheme, the committer commits to an arbitrary message by outputting some commitment. The committer can then later reveal the message, and a verifier can validate that indeed the commitment corresponds to the message.
Polynomial commitment schemes(PCS) are commitment schemes to a polynomial. PCS can achieve some nice additional properties: the committer should be able to “open” certain evaluations of the committed polynomial without revealing the entire polynomial.
KZG
KZG (which stands for Kate, Zaverucha and Goldberg) is a polynomial commitment scheme introduced in this paper in 2010. It has the advantage that both polynomial commitment size and opening proof size are constant (very small), which makes it very useful on blockchains where storage is quite expensive. However, at the same time, it has disadvantages such as requiring elliptic curve support for pairing, requiring a trusted setup, and not being resistant to quantum attacks.
For details on how KZG works, we recommend reading these materials:
Trusted Setup
A trusted setup ceremony is a procedure that is done once to generate a piece of data that must then be used every time some cryptographic protocol is run. Generating this data requires some secret information; the “trust” comes from the fact that some person or some group of people has to generate these secrets, use them to generate the data, and then publish the data and forget the secrets. But once the data is generated, and the secrets are forgotten, no further participation from the creators of the ceremony is required.-How do trusted setups work? by Vitalik Buterin
Requiring a (one-time) trusted setup ceremony is a drawback of using KZG, as it introduces another layer of trust distribution, which adds complexity to design and implementation.
Some other PCSs (such as IPA and Merkle Tree described below) and some newer zk-SNARKs proof systems (e.g., STARKs) do not require such trusted setup ceremonies (called transparency), but they all have some other trade-offs in terms of size or efficiency.
Some other related articles:
- A Guide to Understanding Trusted Setups by Panther Team
- Setup Ceremonies by Anthony Matlala
- The Trusted Setup Phase by Ittai Abraham
- Diving into the zk-SNARKs Setup Phase by Daniel Benarroch
Inner Product Argument
There are two alternatives to KZG commitments:
- Discrete-log-based Inner Product Argument (IPA) commitments
- Merkle roots based on arithmetic-friendly hash functions. Vitalik’s post focuses on this one.
You can learn IPA by searching for keywords Bulletproofs , Halo etc. You can also check these:
- Halo and more: exploring incremental verification and SNARKs without pairings by Vitalik Buterin
- Inner Product Arguments by Dankrad Feist
- The halo2 Book
- Understanding Inner Product Argument by Suyash Bagad
Merkle Tree and Verkle Tree
Merkle tree is a hash tree in which every “leaf” (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes.A Merkle tree allows efficient and secure verification of the contents of a large data structure. (called a Merkle proof)
Merkle trees are such a fundamental concept and are the basis for further understanding of many more advanced concepts such as vector commitments, accumulators, authenticated data structures, membership proofs, etc.
You can check out these excellent links to learn more.
- What is a Merkle Tree? by Alin Tomescu
- Merkle Tree in Blockchain: What is it, How does it work and Benefits by Ravikiran A S
These will be useful if you want to learn more about an improved variant called Verkle Tree:
- Verkle trees by Vitalik Buterin
- Merkle trees vs. Verkle trees, Explained by Jagjit Singh
- Take a Deep Dive Into Verkle Tree For Ethereum by sin7y
zk-SNARKs
Welcome to the wild frontier of cryptography and blockchain technology! zk-SNARKs is much more than a buzzword that will quickly fade with the wind; instead, you can learn (almost) the entirety of cryptography by diving down this rabbit hole of zk-SNARKs.
There is a wealth of learning material available online, just to name a few.
- ZK Whiteboard Sessions by ZK HACK
- Proofs, Arguments, and Zero-Knowledge by Justin Thaler
- CS294: Foundations of Probabilistic Proofs (F2020) Alessandro Chiesa
- Anatomy of a STARK by Alan Szepieniec
Or you can just refer to Kurt Pan’s Awesome Zero-Knowledge Proofs (2022) and I will keep updating this list.
Arithmetic Hash Functions
Arithmetization is the process of turning a generic statement or question into a set of equations to be verified or solved, which is a necessary sub-process (usually the first step) in most zk-SNARKs proof systems.
Arithmetic (or ZK-friendly/ SNARK-friendly) hash functions are those hash functions that are designed to have a “simple” representation after arithmetization process. They can be verified with a simple verifier with a very low number of constraints.
The Ethereum Foundation is holding a ZK Hash Function Cryptanalysis Bounty in 2021 to design and analyze new ZK-friendly hash functions, one of which is Poseidon.
In addition to Poseidon, there are many other designs, like:
I have to give a caveat that all of the above hash functions are still very young, compared to more mature ones like SHA256 and Keccak, and need to be tested and analyzed more before being used in production.
Danksharding
Recently, Vitalik released an updated Ethernet roadmap, and we can clearly see that EIP-4844 and Danksharding are the next big steps for Ethereum after The Merge.
Danksharding combines multiple avenues of cutting edge research to provide the scalable base layer required for Ethereum’s rollup-centric roadmap.
We can roughly say that Full Danksharding = Proto-danksharding + Data Availability Sampling + PBS + 2D KZG scheme + Proof-of-custody + more
.
Here are the best materials for learning about Danksharding:
- The Hitchhiker’s Guide to Ethereum by Jon Charbonneau
- The Complete Guide to Rollups by Jon Charbonneau
- DankSharding — What is it and how does it work? by Rafael Fuentes
- What is danksharding?
Proto-danksharding (EIP-4844)
On the road to full Danksharding, some of the goodies are introduced by EIP-4844 :
- Data blob-carrying transaction format
- KZG commitments to the blobs
- etc.
You can find almost everything about EIP-4844 on this website:
Data Availability Sampling (DAS)
Data Availability Sampling (DAS) enables each node only ends up downloading a small portion of the total data. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob, and attempt to download the data at just those positions.
Finally, the reference list of DAS:
- Data Availability Sampling: From Basics to Open Problems by Paradigm
- An explanation of the sharding + DAS proposal and Data availability sampling in practice by Vitalik Buterin
- Celestia’s Data Availability Layer by Celestia
Conclusion
When you keep learning and finally get the feeling that everything is connected, you’re not far from the cutting edge of innovation.
All Comments