Blockchain protocol Solana has released a report detailing the recent outage on its Mainnet Beta cluster that caused network congestion and a degradation of block finalization times.
On February 25th, 2023, the Solana Mainnet Beta cluster experienced long block finalization times due to congestion within the primary block-propagation protocol, "Turbine." The issue began when a malfunctioning validator broadcast an abnormally large block, overwhelming the network's deduplication logic.
Core engineers identified the issue as a failure of deduplication logic in shred-forwarding services and the retransmission pipeline. Enhancements to the deduplication logic have been implemented in Solana Labs validator clients v1.13.7 and v1.14.17 to mitigate saturation and improve network resiliency.
During the outage, block leaders entered vote-only mode, and normal block production resumed on February 26th with no rollback of finalized transactions. The root cause involved block forwarding services malfunctioning upon encountering an abnormally large block, saturating the Turbine protocol and overwhelming the filtering logic.
Solana's Turbine block propagation protocol breaks blocks into chunks called "shreds" and broadcasts them to the cluster. Turbine is designed as a tree of star networks called "neighborhoods," ensuring an upper bound on the workload for each node and a single loop-free path for each shred.
The network degradation occurred when recovery shreds overwhelmed the deduplication logic, creating loops between nodes in the Turbine tree and saturating deduplication filters. This led to continuous retransmission of duplicate shreds, overwhelming Turbine and slowing block finalization.
The Solana team's implementation of improved deduplication logic in validator clients v1.13.7 and v1.14.17 aims to prevent future network congestion and degradation, further enhancing the resiliency of the Turbine protocol.
All Comments