STARKPack: Aggregating STARKs for shorter proofs and faster verification

Research

•

January 31, 2024

Introduction

‍

In this post, we research the problem of amortizing costs when one must produce many proofs simultaneously. The method, which we call “packing,” applies to FRI-based SNARKs (e.g., ethSTARK, Plonky2, RISC0, Boojum, etc.). It allows the generation of a single proof that many witnesses satisfy some constraints while reducing the verifier’s work and the proof size (compared to producing proofs for each witness and the verifier checking each of these). We show through experimentation that the technique is useful in practice.
‍

In an upcoming technical paper, we will delve deeper into the packing mechanism and will discuss other methods for aggregating FRI-based SNARK proving tasks. This work can somewhat be seen as continuing the recent efforts by the community in designing accumulation methods for SNARK-proving tasks (e.g., folding schemes). However, in the setting of FRI-based SNARKs, one faces the inconvenience of not having homomorphic commitment schemes, which requires substantially different approaches to those developed recently.

‍

In more detail, packing is a batch-proof generation method that allows a prover to produce a single proof of the satisfiability of witnesses for potentially different instances. The main improvement brought by packing is that the prover and verifier perform only one FRI low-degree test across all instances. The verifier time and the proof size of a packed proof are reduced compared to sequentially proving the satisfiability of each witness, with a slight decrease in prover time. Our benchmarks show that for typical traces (2¹⁶ rows, 100 columns), the improvement in verifier time is ~2x, and the reduction of the proof size is ~3x.

‍

As we explain later, this method could be particularly relevant in combination with long recursive proving tasks. For example, when a single prover recursively generates a proof of the validity of several witnesses, the prover could use packing between “recursive layers” to prove statements about smaller verifiers and smaller witnesses at the next layer.
‍

We remark that the packing method can “aggregate” proof tasks for potentially different relations and types of FRI-based SNARK proofs. E.g. for two languages L₁, L₂, and two instances x₁, x₂, packing can amortize the cost of proving that x₁∈L₁ with, say, ethSTARK, and the cost of proving that x₂∈L₂ with, say, Plonky2.

‍

In the next section, we briefly overview the packing technique and its performance improvements. Afterward, we discuss some of its potential applications. We follow that by presenting experimental results. We postpone a formal description of the technique and its corresponding soundness analysis for our upcoming technical paper (which will also describe further optimization techniques).

‍

Overview

‍

We discuss our approach specifically for the ethSTARK [Sta21] protocol (sometimes called STARK). However, the technique can be adapted to any “FRI-based SNARK” (in the sense of [BGK+23]). Recall that the ethSTARK IOP follows the blueprint below. Below, 𝔽 denotes a finite field, and 𝕂 is a finite extension of 𝔽.
‍

Witnesses are oracle functions f₁, …, fₛ: D ⊂ 𝕂→ 𝕂, supposedly polynomials of a certain bounded degree d. Here, D is a coset in 𝕂 of a multiplicative subgroup D₀ ⊂ 𝔽*. The coset D is called the evaluation domain. The subgroup D₀ contains a subgroup H₀=<g> of size d+1 called the trace domain. The latter is in 1–1 correspondence with the rows of an AIR execution trace. Here, d is a degree bound. We call the fᵢ(X) witness maps.
‍
Constraints are polynomial expressions in the functions fᵢ(X) (and some shifts fᵢ(gʲ⋅X)). Each such polynomial is supposed to vanish on the entire trace domain H₀ (or a subset of H₀, but let us assume for simplicity that all constraints are meant to vanish on all H₀).
‍
For example, a constraint may have the form Q(f₁(X), f₂(g⋅ X)) = f₁(X)² + f₂(f⋅ X)³. Then, this expression is supposed to be a “low degree” polynomial that vanishes on H₀, meaning that f₁(h)²=f₂(g⋅ h)³ for all h∈ H₀.
‍
Next, the verifier samples randomness used to produce a single constraint C: H → 𝕂 as a randomized linear combination of each constraint. We call C(X) a DEEP-ALI constraint. For example, if the constraints are Q₁(f₁(X), f₂(g⋅ X)) and Q₂(f₁(g²⋅ C), f₂(X)), then the combined constraint is the map C(X) := Q(f₁(X), f₂(g⋅ X)) + α ⋅ Q₂(f₁(g²⋅ C), f₂(X)), where α is a random element. This constraint, again, is supposedly a low-degree polynomial that vanishes on H₀.
‍
Now, the prover shows that C(X) satisfies the above properties by proving that C(X)=Z(X)⋅ q(X), where Z(X) is the vanishing polynomial of the trace domain H₀, and q(X) is the quotient of C(X) by Z(X), which is supposed to be a low degree polynomial. To this end, the prover sends the verifier an oracle to the quotient map q(X). Then, the following checks are performed:
‍

Checking that C(X)=Z(X)⋅ q(X) holds at a single random point ξ sampled from 𝕂 (called DEEP QUERY point). To make this check, the prover sends the verifier the evaluations of the maps q(X) and {fᵢ(X)} at ξ and at shifts gʲ⋅ ξ, which we loosely denote q_ξ and {fᵢⱼ}, respectively.
‍
The verifier now needs to make sure the received values are correct and that the maps {fᵢ(X)}, q(X) are of low degree. To this end, the prover and the verifier use the Batch-FRI protocol to show that the quotients (fᵢ(X)-fᵢⱼ)/(X-gʲ⋅ξ) and (V(X)-q_ξ)/(X-ξ) are polynomials of low degree.
‍
Recall that Batch-FRI is a protocol used to prove that a collection of maps h₁, …, hₘ: D\to 𝔽 are close to being polynomials of degree ≤ d (more technically, Batch-FRI proves that these maps have δ-correlated agreement in the Reed-Solomon code RS[𝔽, D, d+1]). The protocol works by applying the FRI protocol to a random linear combination of the maps h₁, …, hₘ. We call the random linear combination of the maps (fᵢ(X)-fᵢⱼ)/(X-gʲ⋅ξ) and (V(X)-q_ξ)/(X-ξ) the DEEP-FRI function.
‍

The main observation in our technique is that by using the linearity of the DEEP-ALI constraint combination and the Batch-FRI combination, we may pack constraints and functions across proof generations for different instances into a single combined constraint and a single DEEP-FRI function. In other words, when generating multiple ethSTARK proofs, one can take a random linear combination of all the DEEP-ALI constraints from all the individual proofs. Then, the prover and verifier execute Step 4 above for the resulting “global” DEEP-ALI constraint.
‍

When applying Batch-FRI, we must also reveal evaluations over the evaluation domain D of the batched maps. When instantiated with Merkle commitments, we pack all the evaluations of the witness and quotient maps at the same point in D in the same Merkle leaf (similarly as in [MAGABMMT23]). I.e., following the notation above, a leaf in our Merkle trees looks like Hash(h₁(a), …, hₘ(a)) where a∈D and h₁, …, hₘ are all the witness and quotient maps being committed in the proofs. Note that this does not apply to Merkle trees in the inner rounds of Batch-FRI.
‍

In what follows, for the satisfiability of N witnesses to N instances (these could be Plonkish, AIR, RAP instances, etc.), we will refer to the “sequential verifier” as the verifier that checks a FRI-based proof (these could be produced by using Plonky2, ethSTARK, RISC0, etc) for each instance sequentially. The “packed verifier” will refer to the verifier that checks the STARKPacked proof for the simultaneous satisfiability of each witness. The same terminology applies to the provers.

‍

Computational improvements

‍

The improvement, as opposed to proving the satisfiability of different instances sequentially, mainly comes from the fact that there are now fewer Merkle commitment trees and that we only perform one FRI proof for the satisfiability of all instances. The first point happens because we pack evaluations of all relevant functions in the same Merkle leaf at each round across instances. As we explained, the second happens because we combine all quotient functions into a single DEEP-FRI function.

‍

More concretely, in terms of proof size, the proof contains fewer Merkle root commitments than the sequential proof. It further includes fewer Merkle decommitment paths since there is a single execution of FRI across all packed instances. There is no amortization, however, for the number of function evaluations the packed prover needs to send.

‍

Regarding the packed verifier work, the improvement comes from the reduced number of hash operations it needs to perform. This stems from the packed verifier needing to check fewer Merkle decommitment proofs. More concretely, for Merkle trees involving witness functions and quotient functions q(X), the packed verifier hashes bigger leaves for checking Merkle membership proofs, but there are fewer of these. Further, the packed verifier checks Merkle decomitment paths corresponding to a single execution of FRI.

‍

Our initial benchmarks (see the Performance section) show that for traces of size 2¹⁶×100 both the verifier time and the proof size are reduced by more than half compared to sequentially proving the satisfiability of all instances with Winterfell. In RISC0, even though traces have a fixed number of columns (275), the two times improvement in the verifier time and proof size still applies. Of course, the price to pay when using the packing technique is a loss in the soundness of the protocol as the number of batched functions increases. In our upcoming technical document, we formally prove the soundness of the construction.

‍

We remark that, for Merkle trees involving witness functions and quotient functions q(X), the costs of the Merkle decommitments for the first level of the tree are not amortized by our packing technique. This is because, as explained earlier, the Merkle tree leaves are now vectors of evaluations (one per packed instance), and thus, their size grows linearly on the number N of packed instances. This partially explains why, in our experiments, the packed verifier work scales mostly linearly with the number of traces N.

‍

We also thought that the technique should reduce the prover time for similar reasons as to why it reduces the verifier time. However, the cost of the field operations required to compute the combined DEEP-ALI constraint becomes increasingly important as the number of columns and packed instances grows. Indeed, there is no amortization for the computation of the combined constraint function in our technique. This cost dominates as the number of columns in the RAP (or AIR) instances we batch increases. We saw this in our benchmarks shown below, where for a large number of columns (275), the improvement in the prover time is around 5%.

‍

Applications

‍

We discuss some example applications for the packing technique below. As mentioned earlier, it is worth noting that we can generate a single proof for the simultaneous satisfiability of instances that come from different arithmetizations, say, an AIR instance and a TurboPlonk (or Plonk-ish) instance. This could be of interest depending on the trade-offs that each arithmetization offers and the fact that different rollups use proof systems with different arithmetizations. Further, the packing technique can easily be adapted to pack proofs for different proof systems.

‍

All three applications below rely on the same principle. Namely, the packing method is used between a STARK recursive proof to reduce the size of the verifier circuits and of the recursive witnesses. Accordingly, this results in faster recursive provers. In our upcoming technical paper, we will provide benchmarks of the improvements brought by this strategy. These should be at least as beneficial as the simple initial benchmarks we provide in the next section.

‍

Zk-Rollups

‍

STARKs are widely used by prominent blockchain companies building zero-knowledge proof-based Ethereum scaling solutions called zk-rollups. These include StarkWare, Polygon, and zkSync. Recently, zkSync has launched its Stark-based L2 solution called Boojum, Polygon has deployed Polygon Miden, and Starkware has deployed Starknet; all following a similar architecture. As seen from the Polygon L2 platform description [Tea22], on Layer 2, the correctness of each transaction is proven through an independent STARK proof, and STARK proofs for all transactions are generated independently but in parallel. Later on, these independent STARK Proofs are merged through recursive proof composition. The resulting batch proof is a recursive STARK proof, ensuring the validity of all input proofs.

‍

Our technique could optimize this workflow by generating an aggregated proof of the validity of numerous independent transactions using packing first, thus generating a smaller and cheaper proof to verify. Accordingly, the next recursion iteration is cheaper for the prover since both the verifier circuit and the witnesses are smaller. This can be repeated at further layers of recursion as well.

‍

Batch Provers

‍

‍Starknet has introduced SHARP, a shared prover system designed to process batches of Cairo computational programs off-chain, with their proofs being validated on-chain via a Solidity verifier contract. Since STARK program verification costs are polylogarithmic, various Cairo programs submitted by different users can be batch-verified in a single proof, effectively distributing the verification cost among all programs. The latest iteration of SHARP incorporates recursive proving, where each program is initially proven individually as it arrives. These individual proofs are then progressively merged into recursive proofs. This process continues until a final proof is ready to be submitted to the on-chain verifier contract. Our proposed method aims to enhance the efficiency of SHARP’s recursive proving. Instead of proving each program separately, a process that later becomes part of the recursive prover’s input, our packing argument allows for the bundling and proving of multiple statements simultaneously, resulting in a more compact and quicker-to-verify proof. This could lower the costs associated with generating recursive proofs.

‍

Ethereum Single Slot Finality

‍

‍The Ethereum Foundation recently presented fundamental ideas [KKZM23] towards building a Single Slot Finality consensus system. The note discusses two conceptually different tree-based and gossip-based signature aggregation topologies, both of which rely on an efficient signature aggregation technique. The aggregation scheme allows multiple signatures to be combined into one and also keeps track of the identities of all participating signers. The difficulty in this use case is that each participant can appear in the aggregation process more than once. While there are well-known signature schemes that enable efficient aggregation, such as BLS, keeping track of all signers when each signer can appear multiple times in the aggregation process remains a challenge in communication complexity. The note discusses a new concept called Signature Merge that aims to address this issue efficiently through zero-knowledge cryptography. One of the suggested approaches for instantiating a Signature Merge Scheme is based on the recursive STARKs. A recursive STARK is specific in that the computation contains the verification of other STARK proofs, which in turn validate the correctness of an aggregated signature concerning a provided bitfield. The resulting recursive STARK proof ensures that the incoming proofs are valid and that the union of their bitfields equals the alleged result. One of the limiting factors of recursive STARKs for signature merging, as explained in [KKZM23], is the size of the proofs. This is precisely the scenario where we could leverage the packing proof aggregation technique over the independent STARK proofs first before recursively generating the final proof. Again, we can apply the same idea at every layer of recursion, and this approach could be computationally cheaper overall.

‍

Benchmarks

‍

We have implemented the packing technique in the Winterfell and RISC0 STARK libraries. Winterfell allows for writing arbitrary traces and provides excellent flexibility in optimizing the traces. The RISC0 architecture fixes certain structural elements of the AIR so that all traces have a fixed number (275) of columns independently of the program. The number of columns plays an important role in the final batch proving and verification performance.

‍

We ran different benchmarks in both libraries. In all benchmarks, the number of rows is fixed to be 2¹⁶. In Winterfell, the number of columns varies from 10 to 275 for a circuit that, at each column, computes the transition x_{i+1} = x_i³ + 42 starting from i=1 until i=2¹⁶. In RISC0, the trace is fixed to have 275 columns for a circuit that proves to know two prime factors of a number. All tests have been run on an Intel i712700F machine with 32GB RAM. The acceleration of proving time has been around 5% for the benchmarked instances. This negligible optimization in proving time is due to the large number of columns. The Winterfell benchmarks described below for traces with fewer columns show how the proving time optimization depends on the number of columns.

‍

The diagrams below show the StarkPack performance improvement implemented in RISC0 for AIR traces of 275 columns and Winterfell for traces of 10, 100, and 275 columns.

‍

Figure 1_RISC0 StarkPack Implementation Performance

‍

Figure 2_Winterfell StarkPack Implementation Performance for AIR traces of 10 columns

‍

Figure 3_Winterfell StarkPack Implementation Performance for AIR traces of 100 columns

‍

Figure 5_Winterfell: Proving time optimizations for various AIR traces

‍

Conclusion

‍

This post was designed to offer a high-level summary of an aggregation method for FRI-based SNARKs, highlighting the improvements in verification complexity and proof sizes as evidenced by our initial benchmarks. A comprehensive explanation of this and further techniques, along with a soundness analysis, will be presented in the technical paper.