From the beginning of the Ethereum project, there has been a strong idea of trying to make the core Ethereum as simple as possible and achieve this as much as possible by building protocols on top of it. In the blockchain field, the debate between "building on L1" and "focusing on L2" is usually considered mainly about scalability, but in fact, there are similar issues in meeting the needs of various Ethereum users: digital asset exchange, privacy, usernames, advanced encryption, account security, anti-censorship, front-running protection, and so on. However, there are some cautious ideas recently that are willing to enshrine more of these features into the core Ethereum protocol.
This article will delve into some philosophical reasoning behind the original and minimal encapsulation philosophy, as well as some recent methods of thinking about these ideas. The goal will be to begin building a framework for better identifying possible objectives, among which encapsulating certain functionalities may be worth considering.
About the Early Philosophy of Protocol Minimalism
During the early history of what was then known as "Ethereum 2.0", there was a strong desire to create a clean, simple, and elegant protocol that attempted to do as little work as possible itself and left almost all such work to the user. Ideally, the protocol would be just a virtual machine, and validating a block would be just a virtual machine call.
"State transition function" (the function that processes blocks) will only be called by a single VM, while all other logic will occur through contracts: some system-level contracts, but mainly contracts provided by users. A great feature of this model is that even a complete hard fork can be described as a single transaction for the block processor contract, which will be approved through off-chain or on-chain governance and then run with upgrade permissions.
These discussions from 2015 are particularly relevant to the two areas we are considering: account abstraction and scaling. In the case of scaling, our idea is to attempt to create a maximally abstract form of scaling that feels like a natural extension of the chart above. Contracts can call data that most Ethereum nodes do not store, and the protocol will detect this and solve the call through some very general extension computation. From the perspective of the virtual machine, the call will enter some separate subsystem and then magically return the correct answer after a period of time.
We briefly explored this idea, but quickly abandoned it because we were too focused on verifying that any type of blockchain scaling is possible. Although we will see later that the combination of data availability sampling and ZK-EVM means that a possible future for Ethereum scaling actually looks very close to this vision! On the other hand, for account abstraction, we knew from the beginning that some implementation was possible, so research immediately began to try to make something as close as possible to the pure starting point of "transactions are just calls" a reality.
There is a lot of boilerplate code involved in handling transactions and making actual low-level EVM calls from the sender address. And there will be even more boilerplate code in the future. How can we minimize this code as much as possible, approaching zero?
One of the main codes here is validate_transaction(state, tx), which is responsible for checking the nonce and signature of the transaction. From the beginning, the actual goal of the account abstraction is to allow users to replace basic non-incremental and ECDSA verification with their own verification logic, making it easier for users to use social recovery and multi-signature wallets. Therefore, finding a way to restructure apply_transaction as a simple EVM call is not just a task of "making the code clean for the sake of making the code clean"; instead, it is about moving the logic to the user's account code and providing the flexibility needed for the user.
However, the approach of insisting on including as little fixed logic as possible in apply_transaction ultimately brought many challenges. We can look at one of the earliest account abstraction proposals, EIP-86.
If included as-is, EIP-86 would reduce the complexity of the EVM at the cost of significantly increasing the complexity of other parts of the Ethereum stack, requiring essentially identical code to be written elsewhere, and introducing entirely new and quirky categories, such as the possibility of the same transaction with the same hash appearing multiple times in the chain, not to mention the issue of multiple invalidations.
The issue of multiple invalidations in account abstraction. A transaction included on the chain may invalidate thousands of other transactions in the memory pool, making it easy for the memory pool to be flooded cheaply.
Since then, account abstraction has developed in stages. EIP-86 later became EIP-208, and finally the practical and feasible EIP-2938 emerged.
However, EIP-2938 is not simple at all. Its content includes:
· New transaction type
· Three global variables for three new transaction scopes.
· Two new opcodes, including the very clumsy PAYgas opcode, are used to handle gas price and gas limit checks, as well as EVM execution breakpoints, and to temporarily store ETH for one-time payment fees.
· A set of complex mining and broadcasting strategies, including a list of operation codes prohibited during the transaction verification phase.
In order to achieve account abstraction without involving Ethereum core developers (who focus on optimizing Ethereum clients and implementing merges), EIP-2938 was ultimately restructured as completely off-protocol ERC-4337.
Because this is an ERC, it does not require a hard fork and exists technically "outside the Ethereum protocol". So... is the problem solved? It turns out not. The current mid-term roadmap for ERC-4337 actually involves transforming most of ERC-4337 into a series of protocol features, which is a useful guiding example for understanding why this path should be considered.
Encapsulation ERC-4337
Several key reasons were discussed for ultimately re-including ERC-4337 in the protocol:
Gas Efficiency: Any operation performed within the EVM results in some degree of virtual machine overhead, including low efficiency when using gas-expensive features such as storage slots. Currently, these additional inefficiencies add up to at least 20,000 gas or more. The simplest way to eliminate these issues is to incorporate these components into the protocol.
Code bug risk: If the "entry point contract" of ERC-4337 has a serious enough bug, all wallets compatible with ERC-4337 may see their funds depleted. Replacing the contract with protocol functionality creates an implicit responsibility to eliminate the risk of fund depletion for users by fixing code errors through a hard fork.
Supports EVM opcodes, such as txt.origin. ERC-4337 itself causes txt.origin to return the address of a "bundler" that packages a set of user operations into a transaction. The native account abstraction solves this problem by making txt.origin point to the actual account that sends the transaction, making it operate like an EOA.
Anti-censorship: One of the challenges of separating proposers/builders is that it makes it easier to review individual transactions. In a world where the Ethereum protocol can recognize individual transactions, a list can greatly alleviate this problem, allowing proposers to specify a transaction list that must be included in the next two slots in almost all cases. However, the protocol-external ERC-4337 encapsulates "user operations" in a single transaction, making user operations opaque to the Ethereum protocol; therefore, the inclusion list provided by the Ethereum protocol will not provide censorship resistance to ERC-4337 user operations. Encapsulating ERC-4337 and making user operations a "proper" transaction type will solve this problem.
It is worth mentioning that in its current form, ERC-4337 is much more expensive than "basic" Ethereum transactions: the transaction cost is 21,000 gas, while the cost of ERC-4337 is about 42,000 gas.
In theory, it should be possible to adjust the EVM gas cost system until the cost within the protocol matches the cost of accessing storage outside the protocol. When other types of storage editing operations are cheaper, there is no reason to spend 9000 gas to transfer ETH. In fact, two EIPs related to the upcoming Verkle tree conversion are actually trying to do this. However, even if we do this, there is an obvious reason why the encapsulated protocol functionality will inevitably be much cheaper than EVM code, no matter how efficient the EVM becomes: encapsulated code does not need to pay gas for preloading.
A fully functional ERC-4337 wallet is large, with this implementation compiled and deployed on-chain taking up approximately 12,800 bytes. Of course, you can deploy this code once and use DELEGATECALL to allow each individual wallet to call it, but the code still needs to be accessed in every block where it is used. Under the Verkle tree gas cost EIP, 12,800 bytes will form 413 chunks, accessing these chunks will require paying twice the witness branch_cost (a total of 3,800 gas) and 413 times the witness chunk_cost (a total of 82,600 gas). This doesn't even begin to mention the ERC-4337 entry point itself, which in version 0.6.0, is 23,689 bytes on-chain (approximately 158,700 gas to load according to Verkle tree EIP rules).
This leads to a problem: the actual gas cost of accessing these codes must be shared in some way in the transaction. The current method used by ERC-4337 is not very good: the first transaction in the bundle incurs a one-time storage/code reading cost, making it much more expensive than other transactions. Protocol encapsulation will allow these shared libraries to become part of the protocol, and everyone can access them for free.
We can learn from this example, when is it more common to encapsulate?
In this example, we see some different basic principles in encapsulating account abstractions in the protocol.
When fixed costs are high, market-based approaches that "push complexity to the edge" are most likely to fail. In fact, a long-term account abstraction roadmap looks like each block has a lot of fixed costs. 244, 100 gas for loading standardized wallet code is one thing; but aggregation could add hundreds of thousands of gas for ZK-SNARK verification, as well as on-chain costs for proof verification. There is no way to charge users for these costs without introducing a lot of market inefficiencies, and converting some of these features into protocol features that everyone can access for free can solve this problem well.
Response to code bugs within the community. If some code snippets are used by all or a very wide range of users, it often makes more sense for the blockchain community to take responsibility for hard forks to fix any errors that arise. ERC-4337 introduced a lot of globally shared code, and in the long run, it is undoubtedly more reasonable to fix errors in the code through hard forks than to cause users to lose a lot of ETH.
Sometimes, stronger forms can be achieved by directly utilizing the functions of the protocol. A key example here is the anti-censorship function within the protocol, such as the inclusion list: the inclusion list within the protocol can better ensure censorship resistance than methods outside the protocol. In order for user-level operations to truly benefit from the inclusion list within the protocol, individual user-level operations need to be "readable" by the protocol. Another little-known example is the Ethereum proof-of-stake design in 2017, which abstracted the equity keys of accounts, but this was abandoned in favor of supporting encapsulated BLS because BLS supports an "aggregation" mechanism that must be implemented at the protocol and network levels, which can make processing large numbers of signatures more efficient.
But it is important to remember that, compared to the current situation, even account abstraction within the encapsulation protocol is still a huge "de-encapsulation". Today, top Ethereum transactions can only be initiated from externally owned accounts (EOAs), which use a single secp 256 k1 elliptic curve signature for verification. Account abstraction eliminates this and leaves the verification conditions to be defined by the user. Therefore, in this story about account abstraction, we also see the biggest reason against encapsulation: flexibly meeting the needs of different users.
Let's further enrich this story by examining several other feature examples that have recently been considered for encapsulation. We will focus specifically on: ZK-EVM, proposer-builder separation, private memory pool, liquidity staking, and new precompilation.
Encapsulating ZK-EVM
Let's shift our focus to another potential encapsulation target of the Ethereum protocol: ZK-EVM. Currently, we have a large number of ZK-rollups, all of which must write fairly similar code to verify the execution of Ethereum-like blocks in ZK-SNARK. There is a fairly diverse independent implementation ecosystem: PSE ZK-EVM, Kakarot, Polygon ZK-EVM, Linea, Zeth, and so on.
The recent controversy in the EVM ZK-rollup field is related to how to handle potential bugs in the ZK code. Currently, all running systems have some form of "security council" mechanism that can control the proof system in case of bugs. Last year, I tried to create a standardized framework to encourage projects to clarify their level of trust in the proof system and the security council, and gradually reduce the power of the organization over time.
From a mid-term perspective, rollup may rely on multiple proof systems, while the Security Council only has the power to intervene in extreme cases where two different proof systems produce divergent results.
However, there is a feeling that some of the work feels redundant. We already have the Ethereum base layer, which has an EVM, and we already have a mechanism for dealing with implementation bugs: if there is a bug, the client will be updated to fix it, and then the chain will continue to operate. From the perspective of a client with bugs, it seems that blocks that have already been confirmed will no longer be confirmed, but at least we will not see users losing funds. Similarly, if rollups only want to maintain the same functionality as the EVM, then they need to implement their own governance to constantly change their internal ZK-EVM rules to match the upgrades to the Ethereum base layer, which feels wrong because ultimately they are built on top of the Ethereum base layer itself, which knows when to upgrade and according to what new rules.
Due to the fact that these L2 ZK-EVMs basically use the same EVM as Ethereum, can we somehow incorporate "verifying EVM execution in ZK" into the protocol functionality and handle exceptional situations such as bugs and upgrades through the application of Ethereum's social consensus, just as we have done for the underlying EVM execution itself?
This is an important and challenging topic.
One possible topic of debate about data availability in native ZK-EVM is statefulness. If ZK-EVM does not need to carry witness data, their data efficiency will be much higher. That is to say, if a specific piece of data has already been read or written in a previous block, we can simply assume that the prover can access it and does not need to make it available again. This is not just about not reloading storage and code; in fact, if a rollup correctly compresses data, stateful compression can save up to 3 times more data than stateless compression.
This means that for the ZK-EVM precompilation, we have two options:
1. Precompilation requires that all data be available in the same block. This means that the prover can be stateless, but it also means that using this precompiled ZK-rollup is much more expensive than using a rollup with custom code.
2. Precompilation allows the pointer to point to data that has been previously executed or generated. This makes ZK-rollup approach optimality, but it is more complex and introduces a new state that must be stored by the prover.
What can we learn from this? There is a good reason to encapsulate the verification of ZK-EVM in some way: rollup is already building its own custom version, and Ethereum is willing to reset the weight of its multiple implementations and off-chain social consensus on L1 to execute EVM. This feels wrong, but L2, which does the exact same work, must implement complex little tools involving the Security Council. However, there is a big problem in the details: there are different versions of ZK-EVM, with different costs and benefits. The distinction between stateful and stateless only touches the surface; trying to support "almost-EVM" with custom code that has been proven in other systems may expose greater design space. Therefore, encapsulating ZK-EVM brings both hope and challenges.
Encapsulated Proposal Builder and Separation of Builder (ePBS)
The rise of MEV has made block production a large-scale economic activity, where complex participants can produce blocks that generate more revenue than the default algorithm, which only observes transactions in the memory pool and includes them. So far, the Ethereum community has attempted to solve this problem by proposing a proposer-builder separation scheme outside of protocols such as MEV-Boost, which allows regular validators ("proposers") to outsource block construction to specialized participants ("builders").
However, MEV-Boost makes a trust assumption in a new category of participants called relays. Over the past two years, many have proposed creating "encapsulated PBS". What are the benefits of doing so? In this case, the answer is very simple: PBS built directly using protocol features are more powerful than those built without them (in the sense of weaker trust assumptions). This is similar to the case of price oracles in encapsulated protocols, although there is also strong opposition in this case.
Encapsulating Private Memory Pool
When a user sends a transaction, the transaction is immediately made public and visible to everyone, even before it is included in the chain. This makes users of many applications vulnerable to economic attacks, such as front-running.
Recently, there have been many projects dedicated to creating "private memory pools" (or "encrypted memory pools"), which encrypt users' transactions until they are irreversibly accepted into a block.
However, the problem is that such a solution requires a special type of encryption: in order to prevent users from flooding the system and decrypting it first, the encryption must be automatically decrypted after the transaction is irreversibly accepted.
In order to achieve this form of encryption, there are various techniques with different trade-offs. Jon Charbonneau has provided a good description:
Encrypting centralized operators, such as Flashbots Protect.
Time-lock encryption, after a certain sequence of calculation steps, can be decrypted by anyone and cannot be parallelized.
Threshold encryption, trusting a honest majority committee to decrypt data. For specific recommendations, please refer to the concept of closed beacon chain.
可信硬件,如 SGX。
Trusted hardware, such as SGX.
Unfortunately, each encryption method has its own weaknesses. Although there are some users willing to trust each solution, no solution has enough trust to actually be accepted at Layer 1. Therefore, at least until delayed encryption is perfected or some other technological breakthrough occurs, encapsulating anti-front-running functionality at Layer 1 seems to be a difficult proposition, even though it is a valuable enough feature that many application solutions have emerged.
Encapsulating Liquidity Staking
One common need for Ethereum DeFi users is to be able to use their ETH for both staking and as collateral in other applications. Another common need is simply for convenience: users want to be able to stake without the complexity of running a node and keeping it online at all times (and protecting their online staking keys).
So far, the simplest "interface" that meets these two requirements in the encryption industry is just an ERC 20 token: convert your ETH into "staking ETH", hold it, and then convert it back. In fact, liquidity staking providers such as Lido and Rocket Pool have already started doing this. However, liquidity staking has some natural centralization mechanisms at work: people naturally enter the largest version of staking ETH because it is the most familiar and liquid.
Each version of staked ETH needs a mechanism to determine who can become a underlying node operator. It cannot be unlimited, as attackers will join and use user funds to expand attacks. Currently, the top two are Lido and Rocket Pool, with the former having DAO whitelist node operators and the latter allowing anyone to run a node with a deposit of 8 ETH. These two methods have different risks: the Rocket Pool method allows attackers to launch a 51% attack on the network and force users to pay most of the costs; as for the DAO method, if a staked token dominates, it will lead to a single, potentially vulnerable governance tool controlling a large portion of Ethereum validators. It is worth noting that protocols like Lido have implemented preventive measures, but one layer of defense may not be enough.
In the short term, one option is to encourage ecosystem participants to use a diverse range of liquidity providers to reduce the possibility of systemic risk brought about by a monopoly. However, in the long term, this is an unstable balance, and relying too much on moral pressure to solve the problem is dangerous. A natural question arises: Does it make sense to encapsulate some functionality in the protocol to make liquidity provision less centralized?
The key question here is: what kind of functionality within the protocol? Simply creating a protocol-native substitute for "staking ETH" tokens presents a problem, either it must have an Ethereum-wide governance wrapper to select who runs nodes, or it is open, but this turns it into a tool for attackers.
An interesting idea is Dankrad Feist's article on maximizing liquidity collateral. First, we grit our teeth and accept that if Ethereum is subjected to a 51% attack, only 5% of the attacked ETH will be confiscated. This is a reasonable trade-off; currently, over 26 million ETH are staked, and one-third (about 8 million ETH) of the attack cost is excessive, especially considering how many "off-model" attacks can be completed at lower cost. In fact, similar trade-offs have already been discussed in the "Super Committee" proposal for implementing single-slot finality.
If we accept that only 5% of the attacking ETH is confiscated, then over 90% of the staked ETH will not be affected by confiscation. Therefore, they can be used as interchangeable liquidity staking tokens within the protocol and then used by other applications.
This path is very interesting. But it still leaves one question: what exactly is being encapsulated? Rocket Pool operates in a very similar way: each node operator provides some funds, and liquidity providers provide the rest. We can simply adjust some constants to limit the maximum penalty to 2 ETH, and Rocket Pool's existing rETH will become risk-free.
Through simple protocol adjustments, we can do other clever things. For example, suppose we want a system with two "layers" of staking: node operators (with high collateral requirements) and depositors (with no minimum collateral requirements and can join and leave at any time), but we still want to prevent centralization of node operators by giving a randomly sampled depositor committee power, such as proposing a list of transactions that must be included (for censorship resistance reasons), controlling fork selection during inactive leak periods, or requiring signatures on blocks. This can be achieved in a way that is essentially protocol-agnostic by adjusting the protocol to require each validator to provide (i) a regular staking key and (ii) an ETH address that can be called between each slot to output a secondary staking key. This protocol will give power to both of these keys, but the mechanism for selecting the second key in each slot can be left to the staking pool protocol. Directly encapsulating some functionality may still be better, but it is worth noting that this design space of "including some things and leaving other things to the user" exists.
Encapsulate more precompilation
Precompiled (or "precompiled contract") is an Ethereum contract that implements complex cryptographic operations. Its logic is natively implemented in client-side code, rather than in EVM smart contract code. Precompilation is a compromise solution adopted at the beginning of Ethereum development: because the overhead of the virtual machine is too large for some very complex and highly specialized code, we can implement some key operations that are valuable to important applications in local code to make them faster. Today, this basically includes some specific hash functions and elliptic curve operations.
Currently, there are people pushing to add precompiles for secp256r1, which is an elliptic curve slightly different from secp256k1 used for basic Ethereum accounts. Because it has good support from trusted hardware modules, using it widely can improve wallet security. In recent years, the community has also pushed to add precompiles for BLS-12-377, BW6-761, generalized pairings, and other features.
The rebuttal to these requirements for more precompiled files is that many of the precompilations previously added (such as RIPEMD and BLAKE) ended up being used far less than expected, and we should learn from this. Instead of adding more precompilations for specific operations, we may want to focus on a gentler approach based on ideas such as EVM-MAX and the sleepable but always recoverable SIMD proposal, which would allow EVM implementations to execute a wide range of code classes at lower cost. Perhaps even existing, rarely used precompilations could be removed and replaced with EVM code implementing the same function (inevitably less efficiently). That being said, it is still possible that there are specific cryptographic operations whose value is significant enough to warrant acceleration, and thus adding them as precompilations would make sense.
We learned from all of this?
The desire for minimal encapsulation is understandable and good; it stems from the Unix philosophical tradition of creating minimal software that can easily adapt to users' different needs and avoid the curse of software bloat. However, blockchain is not a personal computing operating system, but a social system. This means that encapsulating certain functions in the protocol makes sense.
In many cases, these other examples are similar to what we see in the account abstraction. But we have also learned some new lessons:
Encapsulation functionality can help avoid centralization risks in other areas of the stack:
Usually, minimizing and simplifying the basic protocol will push complexity to some ecosystems outside the protocol. From the perspective of the Unix philosophy, this is good. However, sometimes there are ecosystems outside the protocol that pose centralization risks, usually (but not exclusively) due to high fixed costs. Encapsulation can sometimes reduce actual centralization.
Too much encapsulation may lead to excessive expansion of trust and governance burden of the protocol:
This is the theme of the previous article on "Don't Let Ethereum Consensus Overload": If encapsulating a specific function weakens the trust model and makes Ethereum as a whole more "subjective", it weakens Ethereum's trust neutrality. In these cases, it is best to treat specific functions as mechanisms above Ethereum rather than trying to introduce them into Ethereum itself. Here, the encrypted memory pool is the best example, which may be a bit difficult to encapsulate, at least until delayed encryption technology improves.
Too much encapsulation may make the protocol too complex:
The complexity of protocols is a systemic risk, and adding too many features to the protocol increases this risk. Precompilation is the best example.
In the long run, encapsulation may backfire as user needs are unpredictable:
A feature that many people consider important and will be used by many users may not be frequently used in practice.
Additionally, liquidity collateral, ZK-EVM, and pre-compiled examples demonstrate the possibility of a middle path: minimal viable enshrinement. The protocol does not need to encapsulate the entire functionality, but can include specific parts that address key challenges, making the functionality easy to implement without being overly paranoid or narrow-minded. Examples of this include:
Instead of encapsulating a complete liquidity staking system, it is better to change the staking penalty rules to make trustless liquidity staking more feasible.
Instead of encapsulating more pre-compilers, it is better to encapsulate EVM-MAX and/or SIMD to make it easier to effectively implement a wider range of operation types.
You can simply encapsulate EVM verification instead of encapsulating the entire concept of rollup.
We can expand the previous chart as follows:
Sometimes, it makes sense to encapsulate certain things, and removing rarely used precompiles is an example. Account abstraction, as a whole, as mentioned earlier, is also an important form of encapsulation. If we want to support backward compatibility for existing users, the mechanism may actually be surprisingly similar to the mechanism of encapsulating precompiles: one proposal is EIP-5003, which will allow EOAs to convert their accounts into contracts with the same (or better) functionality.
Which functions should be introduced into the protocol and which functions should be left to other layers of the ecosystem is a complex balance. As we continue to improve our understanding of user needs and available ideas and technology suites, this balance is expected to continue to improve over time.
All Comments