// inside head tag
In this post, we research the problem of amortizing costs when one must produce many proofs simultaneously. The method, which we call “packing,” applies to FRI-based SNARKs (e.g., ethSTARK, Plonky2, RISC0, Boojum, etc.). It allows the generation of a single proof that many witnesses satisfy some constraints while reducing the verifier’s work and the proof size (compared to producing proofs for each witness and the verifier checking each of these). We show through experimentation that the technique is useful in practice.
In an upcoming technical paper, we will delve deeper into the packing mechanism and will discuss other methods for aggregating FRI-based SNARK proving tasks. This work can somewhat be seen as continuing the recent efforts by the community in designing accumulation methods for SNARK-proving tasks (e.g., folding schemes). However, in the setting of FRI-based SNARKs, one faces the inconvenience of not having homomorphic commitment schemes, which requires substantially different approaches to those developed recently.
In more detail, packing is a batch-proof generation method that allows a prover to produce a single proof of the satisfiability of witnesses for potentially different instances. The main improvement brought by packing is that the prover and verifier perform only one FRI low-degree test across all instances. The verifier time and the proof size of a packed proof are reduced compared to sequentially proving the satisfiability of each witness, with a slight decrease in prover time. Our benchmarks show that for typical traces (2¹⁶ rows, 100 columns), the improvement in verifier time is ~2x, and the reduction of the proof size is ~3x.
As we explain later, this method could be particularly relevant in combination with long recursive proving tasks. For example, when a single prover recursively generates a proof of the validity of several witnesses, the prover could use packing between “recursive layers” to prove statements about smaller verifiers and smaller witnesses at the next layer.
We remark that the packing method can “aggregate” proof tasks for potentially different relations and types of FRI-based SNARK proofs. E.g. for two languages L₁, L₂, and two instances x₁, x₂, packing can amortize the cost of proving that x₁∈L₁ with, say, ethSTARK, and the cost of proving that x₂∈L₂ with, say, Plonky2.
In the next section, we briefly overview the packing technique and its performance improvements. Afterward, we discuss some of its potential applications. We follow that by presenting experimental results. We postpone a formal description of the technique and its corresponding soundness analysis for our upcoming technical paper (which will also describe further optimization techniques).
We discuss our approach specifically for the ethSTARK [Sta21] protocol (sometimes called STARK). However, the technique can be adapted to any “FRI-based SNARK” (in the sense of [BGK+23]). Recall that the ethSTARK IOP follows the blueprint below. Below, 𝔽 denotes a finite field, and 𝕂 is a finite extension of 𝔽.
The main observation in our technique is that by using the linearity of the DEEP-ALI constraint combination and the Batch-FRI combination, we may pack constraints and functions across proof generations for different instances into a single combined constraint and a single DEEP-FRI function. In other words, when generating multiple ethSTARK proofs, one can take a random linear combination of all the DEEP-ALI constraints from all the individual proofs. Then, the prover and verifier execute Step 4 above for the resulting “global” DEEP-ALI constraint.
When applying Batch-FRI, we must also reveal evaluations over the evaluation domain D of the batched maps. When instantiated with Merkle commitments, we pack all the evaluations of the witness and quotient maps at the same point in D in the same Merkle leaf (similarly as in [MAGABMMT23]). I.e., following the notation above, a leaf in our Merkle trees looks like Hash(h₁(a), …, hₘ(a)) where a∈D and h₁, …, hₘ are all the witness and quotient maps being committed in the proofs. Note that this does not apply to Merkle trees in the inner rounds of Batch-FRI.
In what follows, for the satisfiability of N witnesses to N instances (these could be Plonkish, AIR, RAP instances, etc.), we will refer to the “sequential verifier” as the verifier that checks a FRI-based proof (these could be produced by using Plonky2, ethSTARK, RISC0, etc) for each instance sequentially. The “packed verifier” will refer to the verifier that checks the STARKPacked proof for the simultaneous satisfiability of each witness. The same terminology applies to the provers.
The improvement, as opposed to proving the satisfiability of different instances sequentially, mainly comes from the fact that there are now fewer Merkle commitment trees and that we only perform one FRI proof for the satisfiability of all instances. The first point happens because we pack evaluations of all relevant functions in the same Merkle leaf at each round across instances. As we explained, the second happens because we combine all quotient functions into a single DEEP-FRI function.
More concretely, in terms of proof size, the proof contains fewer Merkle root commitments than the sequential proof. It further includes fewer Merkle decommitment paths since there is a single execution of FRI across all packed instances. There is no amortization, however, for the number of function evaluations the packed prover needs to send.
Regarding the packed verifier work, the improvement comes from the reduced number of hash operations it needs to perform. This stems from the packed verifier needing to check fewer Merkle decommitment proofs. More concretely, for Merkle trees involving witness functions and quotient functions q(X), the packed verifier hashes bigger leaves for checking Merkle membership proofs, but there are fewer of these. Further, the packed verifier checks Merkle decomitment paths corresponding to a single execution of FRI.
Our initial benchmarks (see the Performance section) show that for traces of size 2¹⁶×100 both the verifier time and the proof size are reduced by more than half compared to sequentially proving the satisfiability of all instances with Winterfell. In RISC0, even though traces have a fixed number of columns (275), the two times improvement in the verifier time and proof size still applies. Of course, the price to pay when using the packing technique is a loss in the soundness of the protocol as the number of batched functions increases. In our upcoming technical document, we formally prove the soundness of the construction.
We remark that, for Merkle trees involving witness functions and quotient functions q(X), the costs of the Merkle decommitments for the first level of the tree are not amortized by our packing technique. This is because, as explained earlier, the Merkle tree leaves are now vectors of evaluations (one per packed instance), and thus, their size grows linearly on the number N of packed instances. This partially explains why, in our experiments, the packed verifier work scales mostly linearly with the number of traces N.
We also thought that the technique should reduce the prover time for similar reasons as to why it reduces the verifier time. However, the cost of the field operations required to compute the combined DEEP-ALI constraint becomes increasingly important as the number of columns and packed instances grows. Indeed, there is no amortization for the computation of the combined constraint function in our technique. This cost dominates as the number of columns in the RAP (or AIR) instances we batch increases. We saw this in our benchmarks shown below, where for a large number of columns (275), the improvement in the prover time is around 5%.
We discuss some example applications for the packing technique below. As mentioned earlier, it is worth noting that we can generate a single proof for the simultaneous satisfiability of instances that come from different arithmetizations, say, an AIR instance and a TurboPlonk (or Plonk-ish) instance. This could be of interest depending on the trade-offs that each arithmetization offers and the fact that different rollups use proof systems with different arithmetizations. Further, the packing technique can easily be adapted to pack proofs for different proof systems.
All three applications below rely on the same principle. Namely, the packing method is used between a STARK recursive proof to reduce the size of the verifier circuits and of the recursive witnesses. Accordingly, this results in faster recursive provers. In our upcoming technical paper, we will provide benchmarks of the improvements brought by this strategy. These should be at least as beneficial as the simple initial benchmarks we provide in the next section.
STARKs are widely used by prominent blockchain companies building zero-knowledge proof-based Ethereum scaling solutions called zk-rollups. These include StarkWare, Polygon, and zkSync. Recently, zkSync has launched its Stark-based L2 solution called Boojum, Polygon has deployed Polygon Miden, and Starkware has deployed Starknet; all following a similar architecture. As seen from the Polygon L2 platform description [Tea22], on Layer 2, the correctness of each transaction is proven through an independent STARK proof, and STARK proofs for all transactions are generated independently but in parallel. Later on, these independent STARK Proofs are merged through recursive proof composition. The resulting batch proof is a recursive STARK proof, ensuring the validity of all input proofs.
Our technique could optimize this workflow by generating an aggregated proof of the validity of numerous independent transactions using packing first, thus generating a smaller and cheaper proof to verify. Accordingly, the next recursion iteration is cheaper for the prover since both the verifier circuit and the witnesses are smaller. This can be repeated at further layers of recursion as well.
Starknet has introduced SHARP, a shared prover system designed to process batches of Cairo computational programs off-chain, with their proofs being validated on-chain via a Solidity verifier contract. Since STARK program verification costs are polylogarithmic, various Cairo programs submitted by different users can be batch-verified in a single proof, effectively distributing the verification cost among all programs. The latest iteration of SHARP incorporates recursive proving, where each program is initially proven individually as it arrives. These individual proofs are then progressively merged into recursive proofs. This process continues until a final proof is ready to be submitted to the on-chain verifier contract. Our proposed method aims to enhance the efficiency of SHARP’s recursive proving. Instead of proving each program separately, a process that later becomes part of the recursive prover’s input, our packing argument allows for the bundling and proving of multiple statements simultaneously, resulting in a more compact and quicker-to-verify proof. This could lower the costs associated with generating recursive proofs.
The Ethereum Foundation recently presented fundamental ideas [KKZM23] towards building a Single Slot Finality consensus system. The note discusses two conceptually different tree-based and gossip-based signature aggregation topologies, both of which rely on an efficient signature aggregation technique. The aggregation scheme allows multiple signatures to be combined into one and also keeps track of the identities of all participating signers. The difficulty in this use case is that each participant can appear in the aggregation process more than once. While there are well-known signature schemes that enable efficient aggregation, such as BLS, keeping track of all signers when each signer can appear multiple times in the aggregation process remains a challenge in communication complexity. The note discusses a new concept called Signature Merge that aims to address this issue efficiently through zero-knowledge cryptography. One of the suggested approaches for instantiating a Signature Merge Scheme is based on the recursive STARKs. A recursive STARK is specific in that the computation contains the verification of other STARK proofs, which in turn validate the correctness of an aggregated signature concerning a provided bitfield. The resulting recursive STARK proof ensures that the incoming proofs are valid and that the union of their bitfields equals the alleged result. One of the limiting factors of recursive STARKs for signature merging, as explained in [KKZM23], is the size of the proofs. This is precisely the scenario where we could leverage the packing proof aggregation technique over the independent STARK proofs first before recursively generating the final proof. Again, we can apply the same idea at every layer of recursion, and this approach could be computationally cheaper overall.
We have implemented the packing technique in the Winterfell and RISC0 STARK libraries. Winterfell allows for writing arbitrary traces and provides excellent flexibility in optimizing the traces. The RISC0 architecture fixes certain structural elements of the AIR so that all traces have a fixed number (275) of columns independently of the program. The number of columns plays an important role in the final batch proving and verification performance.
We ran different benchmarks in both libraries. In all benchmarks, the number of rows is fixed to be 2¹⁶. In Winterfell, the number of columns varies from 10 to 275 for a circuit that, at each column, computes the transition x_{i+1} = x_i³ + 42 starting from i=1 until i=2¹⁶. In RISC0, the trace is fixed to have 275 columns for a circuit that proves to know two prime factors of a number. All tests have been run on an Intel i712700F machine with 32GB RAM. The acceleration of proving time has been around 5% for the benchmarked instances. This negligible optimization in proving time is due to the large number of columns. The Winterfell benchmarks described below for traces with fewer columns show how the proving time optimization depends on the number of columns.
The diagrams below show the StarkPack performance improvement implemented in RISC0 for AIR traces of 275 columns and Winterfell for traces of 10, 100, and 275 columns.
This post was designed to offer a high-level summary of an aggregation method for FRI-based SNARKs, highlighting the improvements in verification complexity and proof sizes as evidenced by our initial benchmarks. A comprehensive explanation of this and further techniques, along with a soundness analysis, will be presented in the technical paper.