Cointime

Download App
iOS & Android

Possible futures of the Ethereum protocol, part 3: The Scourge

Special thanks to Justin Drake, Caspar Schwarz-Schilling, Phil Daian, Dan Robinson and Max Resnick for feedback and review, and the ethstakers community for discussion.

One of the biggest risks to the Ethereum L1 is proof-of-stake centralizing due to economic pressures. If there are economies-of-scale in participating in core proof of stake mechanisms, this would naturally lead to large stakers dominating, and small stakers dropping out to join large pools. This leads to higher risk of 51% attacks, transaction censorship, and other crises. In addition to the centralization risk, there are also risks of value extraction: a small group capturing value that would otherwise go to Ethereum's users.

Over the last year, our understanding of these risks has increased greatly. It's well understood that there are two key places where this risk exists: (i) block construction, and (ii) staking capital provision. Larger actors can afford to run more sophisticated algorithms ("MEV extraction") to generate blocks, giving them a higher revenue per block. Very large actors can also more effectively deal with the inconvenience of having their capital locked up, by releasing it to others as a liquid staking token (LST). In addition to the direct questions of small vs large stakers, there is also the question of whether or not there is (or will be) too much staked ETH.

The Scourge, 2023 roadmap

This year, there have been significant advancements on block construction, most notably convergence on "committee inclusion lists plus some targeted solution for ordering" as the ideal solution, as well as significant research on proof of stake economics, including ideas such as two-tiered staking models and reducing issuance to cap the percent of ETH staked.

The Scourge: key goals

  • Minimize centralization risks at Ethereum's staking layer (notably, in block construction and capital provision, aka. MEV and staking pools)
  • Minimize risks of excessive value extraction from users

In this chapter

Fixing the block construction pipeline

What problem are we solving?

Today, Ethereum block construction is largely done through extra-protocol propser-builder separation with MEVBoost. When a validator gets an opportunity to propose a block, they auction off the job of choosing block contents to specialized actors called builders. The task of choosing block contents that maximize revenue is very economies-of-scale intensive: specialized algorithms are needed to determine which transactions to include, in order to extract as much value as possible from on-chain financial gadgets and users' transactions interacting with them (this is what is called "MEV extraction"). Validators are left with the relatively economies-of-scale-light "dumb pipe" task of listening for bids and accepting the highest bid, as well as other responsibilities like attesting.

Stylized diagram of what MEVBoost is doing: specialized builders take on the tasks in the red, and stakers take on the tasks in blue.

There are various versions of this, including "proposer-builder separation" (PBS) and "attester-proposer separation" (APS). The difference between these has to do with fine-grained details around which responsibilities go to which of the two actors: roughly, in PBS validators still propose blocks, but receive the payload from builders, and in APS the entire slot becomes the builder's responsibility. Recently, APS is preferred over PBS, because it further reduces incentives for proposers to colocate with builders. Note that APS would only apply to execution blocks, which contain transactions; consensus blocks, which contain proof-of-stake-related data such as attestations, would still be randomly assigned to validators.

This separation of powers helps keep validators decentralized, but it has one important cost: the actors that are doing the "specialized" tasks can easily become very centralized. Here's Ethereum block building today:

Two actors are choosing the contents of roughly 88% of Ethereum blocks. What if those two actors decide to censor a transaction? The answer is not quite as bad as it might seem: they are not able to reorg blocks, and so you don't need 51% censoring to prevent a transaction from getting included at all: you need 100%. With 88% censoring, a user would need to wait an average of 9 slots to get included (technically, an average of 114 seconds, instead of 6 seconds). For some use cases, waiting for two or even five minutes for certain transactions is fine. But for other use cases, eg. defi liquidations, even the ability to delay inclusion of someone else's transaction by a few blocks is a significant market manipulation risk.

The strategies that block builders can employ to maximize revenue can also have other negative consequences for users. A "sandwich attack" could cause users making token swaps to suffer significant losses from slippage. The transactions introduced to make these attacks clog the chain, increasing gas prices for other users.

What is it, and how does it work?

The leading solution is to break down the block production task further: we give the task of choosing transactions back to the proposer (ie. a staker), and the builder can only choose the ordering and insert some transactions of their own. This is what inclusion lists seek to do.

At time T, a randomly selected staker creates an inclusion list, a list of transactions that are valid given the current state of the blockchain at that time. At time T+1, a block builder, perhaps chosen through an in-protocol auction mechanism ahead of time, creates a block. This block is required to include every transaction in the inclusion list, but they can choose the order, and they can add in their own transactions.

Fork-choice-enforced inclusion lists (FOCIL) proposals involve a committee of multiple inclusion list creators per block. To delay a transaction by one block, k of k inclusion list creators (eg. k = 16 ) would have to censor the transaction. The combination of FOCIL with a final proposer chosen by auction that is required to include the inclusion lists, but can reorder and add new transactions, is often called "FOCIL + APS".

A different approach to the problem is multiple concurrent proposers (MCP) schemes such as BRAID. BRAID seeks to avoid splitting up the block proposer role into a low-economies-of-scale part and a high-economies-of-scale part, and instead tries to distribute the block production process among many actors, in such a way that each proposer only needs to have a medium amount of sophistication to maximize their revenue. MCP works by having k parallel proposers generate lists of transactions, and then using a deterministic algorithm (eg. order by highest-to-lowest fee) to choose the order.

BRAID does not seek to attain the goal of dumb-pipe block proposers running default software being optimal. Two easy-to-understand reasons why it cannot do so are:

  1. Last-mover arbitrage attacks: suppose that the average time that proposers submit is T, and the last possible time you can submit and still get included is around T+1. Now, suppose that on centralized exchanges, the ETH/USDC price moves from $2500 to $2502 between T and T+1. A proposer can wait an extra second and add an additional transaction to arbitrage on-chain decentralized exchanges, claiming up to $2 per ETH in profit. Sophisticated proposers who are very well-connected to the network have more ability to do this.
  2. Exclusive order flow: users have the incentive to send transactions directly to one single proposer, to minimize their vulnerability to front-running and other attacks. Sophisticated proposers have an advantage because they can set up infrastructure to accept these direct-from-user transactions, and they have stronger reputations so users who send them transactions can trust that the proposer will not betray and front-run them (this can be mitigated with trusted hardware, but then trusted hardware has trust assumptions of its own)

In BRAID, attesters can still be separated off and run as a dumb-pipe functionality.

In addition to these two extremes, there is a spectrum of possible designs in between. For example, you could auction off a role that only has the right to append to a block, and not to reorder or prepend. You could even let them append or prepend, but not insert in the middle or reorder.

Encrypted mempools

One technology that is crucial to the successful implementation of many of these designs (specifically, either BRAID or a version of APS where there are strict limits on the capability being auctionef off) is encrypted mempools. Encrypted mempools are a technology where users broadcast their transactions in encrypted form, along with some kind of proof of their validity, and the transactions are included into blocks in encrypted form, without the block builder knowing the contents. The contents of the transactions are revealed later.

The main challenge in implementing encrypted mempools is coming up with a design that ensures that transactions do all get revealed later: a simple "commit and reveal" scheme does not work, because if revealing is voluntary, the act of choosing to reveal or not reveal is itself a kind of "last-mover" influence on a block that could be exploited. The two leading techniques for this are (i) threshold decryption, and (ii) delay encryption, a primitive closely related to verifiable delay functions (VDFs).

What are some links to existing research?

What is left to do, and what are the tradeoffs?

We can think of all of the above schemes as being different ways of dividing up the authority involved in staking, arranged on a spectrum from lower economies of scale ("dumb-pipe") to higher economies of scale ("specialization-friendly"). Pre-2021, all of these authorities were bundled together in one actor:

The core conundrum is this: any meaningful authority that remains in the hands of stakers, is authority that could end up being "MEV-relevant". We want a highly decentralized set of actors to have as much authority as possible; this implies (i) putting a lot of authority in the hands of stakers, and (ii) making sure stakers are as decentralized as possible, meaning that they have few economies-of-scale-driven incentives to consolidate. This is a difficult tension to navigate.

We can view FOCIL + APS as follows. Stakers continue to have the authority on the left part of the spectrum, while the right part of the spectrum gets auctioned off to the highest bidder.

BRAID is quite different. The "staker" piece is larger, but it gets split into two pieces: light stakers and heavy stakers. Meanwhile, because transactions are ordered in decreasing order of priority fee, the top-of-block choice gets de-facto auctioned off via the fee market, in a scheme that can be viewed as analogous to enshrined PBS.

Note that the safety of BRAID depends heavily on encrypted mempools; otherwise, the top-of-block auction mechanism becomes vulnerable to strategy-stealing attacks (essentially: copying other people's transactions, swapping the recipient address, and paying a 0.01% higher fee). This need for pre-inclusion privacy is also the reason why enshrined PBS is so tricky to implement.

Finally, more "aggressive" versions of FOCIL + APS, eg. the option where APS only determines the end of the block, look like this:

The main remaining task is to (i) work on solidifying the various proposals and analyzing their consequences, and (ii) combine this analysis with an understanding of the Ethereum community's goals in terms of what forms of centralization it will tolerate. There is also work to be done on each individual proposal, such as:

  • Continuing work on encrypted mempool designs, and getting to the point where we have a design that is both robust and reasonably simple, and plausibly ready for inclusion.
  • Optimizing the design of multiple inclusion lists to make sure that (i) it does not waste data, particularly in the context of inclusion lists covering blobs, and (ii) it is friendly to stateless validators.
  • More work on the optimal auction design for APS.

Additionally, it's worth noting that these different proposals are not necessarily incompatible forks on the road from each other. For example, implementing FOCIL + APS could easily serve as a stepping stone to implementing BRAID. A valid conservative strategy would be a "wait-and-see" approach where we first implement a solution where stakers' authority is limited and most of the authority is auctioned off, and then slowly increase stakers' authority over time as we learn more about the MEV market operation on the live network.

How does it interact with other parts of the roadmap?

There are positive interactions between solving one staking centralization bottleneck and solving the others. To give an analogy, imagine a world where starting your own company required growing your own food, making your own computers and having your own army. In this world, only a few companies could exist. Solving one of the three problems would help the situation, but only a little. Solving two problems would help more than twice as much as solving one. And solving three would be far more than three times as helpful - if you're a solo entrepreneur, either 3/3 problems are solved or you stand no chance.

In particular, the centralization bottlenecks for staking are:

  • Block construction centralization (this section)
  • Staking centralization for economic reasons (next section)
  • Staking centralization because of the 32 ETH minimum (solved with Orbit or other techniques; see the post on the Merge)
  • Staking centralization because of hardware requirements (solved in the Verge, with stateless clients and later ZK-EVMs)

Solving any one of the four increases the gains from solving any of the others.

Additionally, there are interactions between the block construction pipeline and the single slot finality design, particularly in the context of trying to reduce slot times. Many block construction pipeline designs end up increasing slot times. Many block construction pipelines involve roles for attesters at multiple steps in the process. For this reason, it can be worth thinking about the block construction pipelines and single slot finality simultaneously.

Fixing staking economics

What problem are we solving?

Today, about 30% of the ETH supply is actively staking. This is far more than enough to protect Ethereum from 51% attacks. If the percent of ETH staked grows much larger, researchers fear a different scenario: the risks that would arise if almost all ETH becomes staked. These risks include:

  • Staking turns from being a profitable task for specialists into a duty for all ETH holders. Hence, the average staker would be much more unenthusiastic, and would choose the easiest approach (realistically, delegating their tokens to whichever centralized operator offers the most convenience)
  • Credibility of the slashing mechanism weakens if almost all ETH is staked
  • A single liquid staking token could take over the bulk of the stake and even taking over "money" network effects from ETH itself
  • Ethereum needlessly issuing an extra ~1m ETH/year. In the case where one liquid staking token gets dominant network effect, a large portion of this value could potentially even get captured by the LST.

What is it, and how does it work?

Historically, one class of solution has been: if everyone staking is inevitable, and a liquid staking token is inevitable, then let's make staking friendly to having a liquid staking token that is actually trustless, neutral and maximally decentralized. One simple way to do this is to cap staking penalties at eg. 1/8, which would make 7/8 of staked ETH unslashable, and thus eligible to be put into the same liquid staking token. Another option is to explicitly create two tiers of staking: "risk-bearing" (slashable) staking, which would somehow be capped to eg. 1/8 of all ETH, and "risk-free" (unslashable) staking, which everyone could participate in.

However, one criticism of this approach is that it seems economically equivalent to something much simpler: massively reduce issuance if the stake approaches some pre-determined cap. The basic argument is: if we end up in a world where the risk-bearing tier has 3.4% returns and the risk-free tier (which everyone participates in) has 2.6% returns, that's actually the same thing as a world where staking ETH has 0.8% returns and just holding ETH has 0% returns. The dynamics of the risk-bearing tier, including both total quantity staked and centralization, would be the same in both cases. And so we should just do the simple thing and reduce issuance.

The main counterargument to this line of argument would be if we can make the "risk-free tier" still have some useful role and some level of risk (eg. as proposed by Dankrad here).

Both of these lines of proposals imply changing the issuance curve, in a way that makes returns prohibitively low if the amount of stake gets too high.

Left: one proposal for an adjusted issuance curve, by Justin Drake. Right: another set of proposals, by Anders Elowsson.

Two-tier staking, on the other hand, requires setting two return curves: (i) the return rate for "basic" (risk-free or low-risk) staking, and (ii) the premium for risk-bearing staking. There are different ways to set these parameters: for example, if you set a hard parameter that 1/8 of stake is slashable, then market dynamics will determine the premium on the return rate that slashable stake gets.

Another important topic here is MEV capture. Today, revenue from MEV (eg. DEX arbitrage, sandwiching...) goes to proposers, ie. stakers. This is revenue that is completely "opaque" to the protocol: the protocol has no way of knowing if it's 0.01% APR, 1% APR or 20% APR. The existence of this revenue stream is highly inconvenient from multiple angles:

  1. It is a volatile revenue source, as each individual staker only gets it when they propose a block, which is once every ~4 months today. This creates an incentive to join pools for more stable income.
  2. It leads to an unbalanced allocation of incentives: too much for proposing, too little for attesting.
  3. It makes stake capping very difficult to implement: even if the "official" return rate is zero, the MEV revenue alone may be enough to drive all ETH holders to stake. As a result, a realistic stake capping proposal would in fact have to have returns approach negative infinity, as eg. proposed here. This, needless to say, creates more risk for stakers, especially solo stakers.

We can solve these problems by finding a way to make MEV revenue legible to the protocol, and capturing it. The earliest proposal was Francesco's MEV smoothing; today, it's widely understood that any mechanism for auctioning off block proposer rights (or, more generally, sufficient authority to capture almost all MEV) ahead of time accomplishes the same goal.

What are some links to existing research?

What is left to do, and what are the tradeoffs?

The main remaining task is to either agree to do nothing, and accept the risks of almost all ETH being inside LSTs, or finalize and agree on the details and parameters of one of the above proposals. An approximate summary of the benefits and risks is:

PolicyNeed to decideRisks to analyze
Do nothing* MEV burn implementation, if any* Almost 100% of ETH staked, likely in LSTs (perhaps a single dominant one)* Macroeconomic risks
Stake capping (via changing issuance curve)* Reward function and parameters (esp. what the cap is)* MEV burn implementation* Open question of which stakers enter and leave, possibility that remaining staker set is centralized
* Two-tiered staking* The role of the risk-free tier* Parameters (eg. the economics that determine the amount staked in the risk-bearing tier)* MEV burn implementation* Open question of which stakers enter and leave, possibility that risk-bearing set is centralized

How does it interact with other parts of the roadmap?

One important point of intersection has to do with solo staking. Today, the cheapest VPSes that can run an Ethereum node cost about $60 per month, primarily due to hard disk storage costs. For a 32 ETH staker ($84,000 at the time of this writing), this decreases APY by (60 * 12) / 84000 ~= 0.85% . If total staking returns drop below 0.85%, this makes solo staking unviable for many people at these levels.

If we want solo staking to continue to be viable, this puts further emphasis on the need to reduce node operation costs, which will be done in the Verge: statelessness will remove storage space requirements, which may be sufficient on its own, and then L1 EVM validity proofs will make costs completely trivial.

On the other hand, MEV burn arguably helps solo staking. Although it decreases returns for everyone, it more importantly decreases variance, making staking less like a lottery.

Finally, any change in issuance interacts with other fundamental changes to the staking design (eg. rainbow staking). One particular point of concern is that if staking returns become very low, this means we have to choose between (i) making penalties also low, reducing disincentives against bad behavior, and (ii) keeping penalties high, which would increase the set of circumstances in which even well-meaning validators accidentally end up with negative returns if they get unlucky with technical issues or even attacks.

Application layer solutions

The above sections focused on changes to the Ethereum L1 that can solve important centralization risks. However, Ethereum is not just an L1, it is an ecosystem, and there are also important application-layer strategies that can help mitigate the above risks. A few examples include:

  • Specialized staking hardware solutions - some companies, such as Dappnode, are selling hardware that is specifically designed to make it as easy as possible to operate a staking node. One way to make this solution more effective, is to ask the question: if a user is already spending the effort to have a box running and connected to the internet 24/7, what other services could it provide (to the user or to others) that benefit from decentralization? Examples that come to mind include (i) running locally hosted LLMs, for self-sovereignty and privacy reasons, and (ii) running nodes for a decentralized VPN.
  • Squad staking - this solution from Obol allows multiple people to stake together in an M-of-N format. This will likely get more and more popular over time, as statelessness and later L1 EVM validity proofs will reduce the overhead of running more nodes, and the benefit of each individual participant needing to worry much less about being online all the time starts to dominate. This is another way to reduce the cognitive overhead of staking, and ensure solo staking prospers in the future.
  • Airdrops - Starknet gave an airdrop to solo stakers. Other projects wishing to have a decentralized and values-aligned set of users may also consider giving airdrops or discounts to validators that are identified as probably being solo stakers.
  • Decentralized block building marketplaces - using a combination of ZK, MPC and TEEs, it's possible to create a decentralized block builder that participates in, and wins, the APS auction game, but at the same time provides pre-confirmation privacy and censorship resistance guarantees to its users. This is another path toward improving users' welfare in an APS world.
  • Application-layer MEV minimization - individual applications can be built in a way that "leaks" less MEV to L1, reducing the incentive for block builders to create specialized algorithms to collect it. One simple strategy that is universal, though inconvenient and composability-breaking, is for the contract to put all incoming operations into a queue and execute them in the next block, and auction off the right to jump the queue. Other more sophisticated approaches include doing more work offchain eg. as Cowswap does. Oracles can also be redesigned to minimize oracle-extractable value.
Comments

All Comments

Recommended for you