🎉 The #CandyDrop Futures Challenge is live — join now to share a 6 BTC prize pool!
📢 Post your futures trading experience on Gate Square with the event hashtag — $25 × 20 rewards are waiting!
🎁 $500 in futures trial vouchers up for grabs — 20 standout posts will win!
📅 Event Period: August 1, 2025, 15:00 – August 15, 2025, 19:00 (UTC+8)
👉 Event Link: https://www.gate.com/candy-drop/detail/BTC-98
Dare to trade. Dare to win.
The Shoal framework significantly drops the latency of the Aptos Blockchain Bullshark.
Shoal Framework: Significantly Reduce Latency of Bullshark on Aptos
Aptos Labs recently solved two important open problems in DAG BFT, significantly reducing latency, and for the first time eliminated the need for timeouts in deterministic practical protocols. Overall, under fault-free conditions, Bullshark's latency improved by 40%, and under fault conditions, it improved by 80%.
Shoal is a framework that enhances the Narwhal-based consensus protocol ( through pipeline and leader reputation mechanisms, such as DAG-Rider, Tusk, and Bullshark ). The pipeline reduces DAG sorting latency by introducing an anchor point in each round, while leader reputation further improves latency issues by ensuring that the anchor point is associated with the fastest validating nodes. Additionally, leader reputation enables Shoal to leverage asynchronous DAG construction to eliminate timeouts in all scenarios. This allows Shoal to provide universally responsive attributes, incorporating the optimistic responses that are typically required.
This technology is very simple, involving running multiple instances of the underlying protocol one after the other in sequence. Therefore, when instantiated with Bullshark, we get a group of "sharks" engaged in a relay race.
Background
In the pursuit of high performance in blockchain networks, there has been a continuous focus on reducing communication complexity. However, this approach has not led to a significant increase in throughput. For example, Hotstuff implemented in early versions of Diem only achieved 3500 TPS, far below the target of over 100,000 TPS.
The recent breakthrough stems from the realization that data propagation is a major bottleneck based on leader protocols and can benefit from parallelization. The Narwhal system separates data propagation from core consensus logic and proposes an architecture where all validators propagate data simultaneously, while the consensus component only orders a small amount of metadata. The Narwhal paper reports a throughput of 160,000 TPS.
Previously, we introduced Quorum Store, which is our Narwhal implementation that separates data propagation from consensus, and how to use it to scale the current consensus protocol Jolteon. Jolteon is a leader-based protocol that combines Tendermint's linear fast path and PBFT-style view changes, reducing Hotstuff latency by 33%. However, it is clear that leader-based consensus protocols cannot fully leverage Narwhal's throughput potential. Although data propagation and consensus are separated, the leader of Hotstuff/Jolteon is still constrained as throughput increases.
Therefore, we have decided to deploy Bullshark, a zero-communication-overhead consensus protocol, on top of the Narwhal DAG. Unfortunately, the DAG structure that supports Bullshark's high throughput comes with a 50% latency cost compared to Jolteon.
This article describes how Shoal significantly reduces Bullshark latency.
DAG-BFT Background
Each vertex in the Narwhal DAG is associated with a round. To enter round r, a validator must first obtain n-f vertices belonging to round r-1. Each validator can broadcast one vertex per round, and each vertex must reference at least n-f vertices from the previous round. Due to network asynchrony, different validators may observe different local views of the DAG at any given time.
A key property of DAG is that it is unambiguous: if two validating nodes have the same vertex v in their local view of the DAG, then they have exactly the same causal history of v.
Preface
It is possible to reach consensus on the total order of all vertices in the DAG without additional communication overhead. To this end, the validators in DAG-Rider, Tusk, and Bullshark interpret the structure of the DAG as a consensus protocol, where vertices represent proposals and edges represent votes.
Although the group intersection logic on the DAG structure is different, all existing consensus protocols based on Narwhal have the following structure:
Reservation anchor point: Every few rounds (, like in Bullshark, there is a predetermined leader for every two rounds ), and the leader's peak is called the anchor point.
Sorting anchors: Validators independently but deterministically decide which anchors to sort and which anchors to skip.
Causal History Ordering: Validators process the ordered anchor point list one by one, sorting all previously unordered vertices in their causal history for each anchor point according to certain deterministic rules.
The key to meeting security is to ensure that in step (2), the ordered anchor point lists created by all honest validating nodes share the same prefix. In Shoal, we make the following observations about all these protocols:
All validators agree on the first ordered anchor point.
Bullshark latency
The latency of Bullshark depends on the number of rounds between ordered anchors in the DAG. Although the most practical part of the synchronous version of Bullshark has better latency than the asynchronous version, it is far from optimal.
Question 1: Average block latency. In Bullshark, each even round has an anchor point, and the vertices of each odd round are interpreted as votes. In common cases, it takes two rounds of the DAG to sort the anchor points; however, the vertices in the causal history of the anchor points require more rounds to wait for the anchor points to be sorted. In common cases, the vertices in odd rounds need three rounds, while the non-anchor vertices in even rounds require four rounds.
Question 2: Fault situation latency. The above latency analysis applies to the fault-free case; on the other hand, if a leader in a certain round fails to broadcast the anchor points quickly enough, the anchor points cannot be sorted ( and are therefore skipped ). As a result, all unsorted vertices from the previous rounds must wait for the next anchor point to be sorted. This significantly reduces the performance of the geographic replication network, especially since Bullshark uses timeouts to wait for the leader.
Shoal Framework
Shoal addresses these two latency issues by enhancing Bullshark( or any other Narwhal-based BFT protocol) through pipelining, allowing for an anchor point in every round and reducing the latency of all non-anchor vertices in the DAG to three rounds. Shoal also introduces a zero-cost leader reputation mechanism in the DAG, which biases the selection towards fast leaders.
Challenge
In the context of the DAG protocol, pipeline and leader reputation are considered difficult issues for the following reasons:
Previous pipeline attempts tried to modify the core Bullshark logic, but this seems to be fundamentally impossible.
The introduction of leader reputation in DiemBFT and its formalization in Carousel is based on dynamically selecting future leaders according to the past performance of validators, the idea of which is to anchor in (Bullshark. Although disagreements on leader identity do not violate the security of these protocols, in Bullshark, it may lead to completely different orderings, which brings us to the core of the issue: dynamically and deterministically selecting wheel anchors is essential for resolving consensus, and validators need to reach agreement on the ordered history to select future anchors.
As evidence of the difficulty of the problem, we note that the implementation of Bullshark, including the current implementation in the production environment, does not support these features.
![In-depth Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-859e732e16c3eee0e2c93422474debc2.webp(
Protocol
Despite the above challenges, the solution proves to be hidden in simplicity.
In Shoal, we rely on the ability to perform local computations on the DAG and achieve the capability to save and reinterpret information from previous rounds. With the core insight that all validators agree on the first ordered anchor point, Shoal sequentially combines multiple Bullshark instances for pipelining, making ) the first ordered anchor point the switching point of the instances, and ( the causal history of the anchor point used to calculate the reputation of the leader.
Pipeline
V that maps rounds to leaders. Shoal runs instances of Bullshark one after another, so for each instance, the anchor is predetermined by the mapping F. Each instance sorts an anchor, which triggers the switch to the next instance.
Initially, Shoal launched the first instance of Bullshark in the first round of the DAG and ran it until the first ordered anchor point was established, such as in round r. All validators agreed on this anchor point. Therefore, all validators can confidently agree to reinterpret the DAG starting from round r+1. Shoal simply launched a new instance of Bullshark in round r+1.
In the best case, this allows Shoal to sort an anchor in each round. The anchor points in the first round are sorted by the first instance. Then, Shoal starts a new instance in the second round, which itself has an anchor point that is sorted by that instance, and then another new instance sorts the anchor point in the third round, and then the process continues.
![Detailed Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-9f789cb669f6fcc244ea7ff7648e48b4.webp(
Leader Reputation
During the Bullshark sorting period, skipping anchor points increases latency. In this case, pipeline techniques are powerless because new instances cannot be started before the previous instance sorts the anchor point. Shoal ensures that corresponding leaders are less likely to be chosen to handle the missing anchor points in the future by assigning a score to each validator node based on the historical activity of each validation node's recent activity using a reputation mechanism. Validators that respond and participate in the protocol will receive high scores; otherwise, validator nodes will be assigned low scores because they may crash, be slow, or act maliciously.
The concept is to deterministically recalculate the predefined mapping F from rounds to leaders each time the score is updated, favoring the leaders with higher scores. In order for validators to reach consensus on the new mapping, they should reach consensus on the scores, thereby achieving consensus on the history used to derive the scores.
In Shoal, the pipeline and leadership reputation can naturally integrate, as they both use the same core technology, which is to reinterpret the DAG after reaching consensus on the first ordered anchor point.
In fact, the only difference is that after sorting the anchors in the r-th round, the validators only need to calculate the new mapping F' from round r+1 based on the causal history of the ordered anchors in round r. Then, the validating nodes execute a new instance of Bullshark using the updated anchor selection function F' starting from round r+1.
![A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-1baf540693f376d93cb18ef3193593cc.webp(
No more timeouts
Timeouts play a crucial role in all leader-based deterministic partial synchronous BFT implementations. However, the complexity they introduce increases the number of internal states that need to be managed and monitored, which adds complexity to the debugging process and requires more observability techniques.
Timeouts can also significantly increase latency, as it is important to configure them properly and they often need to be adjusted dynamically, as they heavily depend on the environment ) network (. Before transitioning to the next leader, the protocol pays the full timeout latency penalty for the failed leader.