polynomials

Introduction

Let \(A(X)\) be a polynomial over \(\mathbb{F}_p\) with formal indeterminate \(X\). As an example,

$$
A(X) = a_0 + a_1 X + a_2 X^2 + a_3 X^3
$$

defines a degree-\(3\) polynomial. \(a_0\) is referred to as the constant term. Polynomials of
degree \(n-1\) have \(n\) coefficients.

(aside) Horner’s rule

Horner’s rule allows for efficient evaluation of a polynomial of degree \(n-1\), using
only \(n-1\) multiplications and \(n-1\) additions. It is the following identity:
$$\begin{aligned}a_0 &+ a_1X + a_2X^2 + \cdots + a_{n-1}X^{n-1} \ &= a_0 + X\bigg( a_1 + X \Big( a_2 + \cdots + X(a_{n-2} + X a_{n-1}) \Big)!\bigg),\end{aligned}$$

Quotient Poly

  1. divide the evaluation in each point & use iFFT to get the poly coefficients
  2. or, could use polynomial long division

The Schwartz-Zippel lemma

The Schwartz-Zippel lemma informally states that “different polynomials are different at
most points.” Formally, it can be written as follows:

Let \(p(x_1, x_2, \cdots, x_n)\) be a nonzero polynomial of \(n\) variables with degree \(d\).
Let \(S\) be a finite set of numbers with at least \(d\) elements in it. If we choose random
\(\alpha_1, \alpha_2, \cdots, \alpha_n
\) from \(S\),
$$\text{Pr}[p(\alpha_1, \alpha_2, \cdots, \alpha_n) = 0] \leq \frac{d}{|S|}.$$

In the familiar univariate case \(p(X)\), this reduces to saying that a nonzero polynomial
of degree \(d\) has at most \(d\) roots.

The Schwartz-Zippel lemma is used in polynomial equality testing. Given two multi-variate
polynomials \(p_1(x_1,\cdots,x_n)\) and \(p_2(x_1,\cdots,x_n)\) of degrees \(d_1, d_2\)
respectively, we can test if
\(p_1(\alpha_1, \cdots, \alpha_n) - p_2(\alpha_1, \cdots, \alpha_n) = 0\) for random
\(\alpha_1, \cdots, \alpha_n \leftarrow S,\) where the size of \(S\) is at least
\(|S| \geq (d_1 + d_2).\) If the two polynomials are identical, this will always be true,
whereas if the two polynomials are different then the equality holds with probability at
most \(\frac{\max(d_1,d_2)}{|S|}\).

Vanishing polynomial

Consider the order-\(n\) multiplicative subgroup \(\mathcal{H}\) with primitive root of unity
\(\omega\). For all \(\omega^i \in \mathcal{H}, i \in [n-1],\) we have
\((\omega^i)^n = (\omega^n)^i = (\omega^0)^i = 1.\) In other words, every element of
\(\mathcal{H}\) fulfils the equation

$$
\begin{aligned}
Z_H(X) &= X^n - 1 \
&= (X-\omega^0)(X-\omega^1)(X-\omega^2)\cdots(X-\omega^{n-1}),
\end{aligned}
$$

meaning every element is a root of \(Z_H(X).\) We call \(Z_H(X)\) the vanishing polynomial
over \(\mathcal{H}\) because it evaluates to zero on all elements of \(\mathcal{H}.\)

This comes in particularly handy when checking polynomial constraints. For instance, to
check that \(A(X) + B(X) = C(X)\) over \(\mathcal{H},\) we simply have to check that
\(A(X) + B(X) - C(X)\) is some multiple of \(Z_H(X)\). In other words, if dividing our
constraint by the vanishing polynomial still yields some polynomial
\(\frac{A(X) + B(X) - C(X)}{Z_H(X)} = H(X),\) we are satisfied that \(A(X) + B(X) - C(X) = 0\)
over \(\mathcal{H}.\)

Lagrange basis functions

Polynomials are commonly written in the monomial basis (e.g. \(X, X^2, … X^n)\). However,
when working over a multiplicative subgroup of order \(n\), we find a more natural expression
in the Lagrange basis.

Consider the order-\(n\) multiplicative subgroup \(\mathcal{H}\) with primitive root of unity
\(\omega\). The Lagrange basis corresponding to this subgroup is a set of functions
\(\mathcal{L_i}_{i = 0}^{n-1}\)
where

$$
\mathcal{L_i}(\omega^j) = \begin{cases}
1 & \text{if } i = j, \
0 & \text{otherwise.}
\end{cases}
$$

We can write this more compactly as \(\mathcal{L_i}(\omega^j) = \delta_{ij},\) where
\(\delta\) is the Kronecker delta function.

Now, we can write our polynomial as a linear combination of Lagrange basis functions,

$$A(X) = \sum_{i = 0}^{n-1} a_i\mathcal{L_i}(X), X \in \mathcal{H},$$

which is equivalent to saying that \(A(X)\) evaluates to \(a_0\) at \(\omega^0\),
to \(a_1\) at \(\omega^1\), to \(a_2\) at \(\omega^2, \cdots,\) and so on.

When working over a multiplicative subgroup, the Lagrange basis function has a convenient
sparse representation of the form

$$
\mathcal{L}_i(X) = \frac{c_i\cdot(X^{n} - 1)}{X - \omega^i},
$$

where \(c_i\) is the barycentric weight. (To understand how this form was derived, refer to
[^barycentric].) For \(i = 0,\) we have
\(c = 1/n \implies \mathcal{L}_0(X) = \frac{1}{n} \frac{(X^{n} - 1)}{X - 1}\).

Suppose we are given a set of evaluation points \({x_0, x_1, \cdots, x_{n-1}}\).
Since we cannot assume that the \(x_i\)’s form a multiplicative subgroup, we consider also
the Lagrange polynomials \(\mathcal{L}_i\)’s in the general case. Then we can construct:

$$
\mathcal{L}i(X) = \prod{j\neq i}\frac{X - x_j}{x_i - x_j}, i \in [0..n-1].
$$

Here, every \(X = x_j \neq x_i\) will produce a zero numerator term \((x_j - x_j),\) causing
the whole product to evaluate to zero. On the other hand, \(X= x_i\) will evaluate to
\(\frac{x_i - x_j}{x_i - x_j}\) at every term, resulting in an overall product of one. This
gives the desired Kronecker delta behaviour \(\mathcal{L_i}(x_j) = \delta_{ij}\) on the
set \({x_0, x_1, \cdots, x_{n-1}}\).

Lagrange interpolation

Given a polynomial in its evaluation representation

$$A: {(x_0, A(x_0)), (x_1, A(x_1)), \cdots, (x_{n-1}, A(x_{n-1}))},$$

we can reconstruct its coefficient form in the Lagrange basis:

$$A(X) = \sum_{i = 0}^{n-1} A(x_i)\mathcal{L_i}(X), $$

where \(X \in {x_0, x_1,\cdots, x_{n-1}}.\)

references

multi scalar multiplication (MSM)

Problem

Let \( \mathbb{G}\) be a group of order \(p=2^{\lambda}\)(\(\lambda\) would be the number of bits of the prime). Give \(n\) scalars \(e_i\) and \(n\) EC points \(P_i\), calculate \(P\) such that
\[ P = \sum_{i=0}^{n}e_i P_i \]

naive approach

Let \(e_i = e_{i,\lambda -1} e_{i,\lambda -2} …e_{i,0} \), where each \(e_{i,j} \in 0,1\).
We have
\[ e_i = e_{i,\lambda-1}\cdot2^{\lambda-1} + e_{i, \lambda-2}\cdot2^{\lambda-2} + …+e_{i,0}\cdot2^0\]
To compute \(e_i \cdot p_i\), we can compute \(2^0 \cdot p_i,2^1 \cdot p_i,…,2^{\lambda-1} \cdot p_i \). Then, we simply add them up to get \(e_i \cdot p_i\). Here, we need \(\lambda -1\) squarings (multiplication by 2) and \(\lambda-1\) additions.
To computing \(e_1\cdot p_1, e_2 \cdot p_2, …, e_{N-1}\cdot p_{N-1}\), we need \(N\cdot(\lambda-1)\) squarings and \(N\cdot(\lambda-1)\) additions.

We additionally need \(N-1\) additions to sum \(e_i\cdot p_i\) into \(P\).
In total, weed \(N\cdot(\lambda-1)\) squarings and \(N\cdot\lambda=1\) additions

pippenger approach

at a high level, Pippenger approach divides \(\lambda\) bits into num_window (\(=\frac{\lambda}{c}\)) bit windows where each bit window has \(c\) bits. \(c\) is also called “window size”.

Part I (computation for each window)

Initialize a vector window_res = [0,0,...,0] of length num_window.
For the w-th window:

  • Initialize a vector buckets=[0,0,...,0] of length num_buckets = \( 2^c -1 \).
  • Get the start_idx of the current window, say \(M\). Remember that we are considering a c-bit window out of \(\lambda\)-bit fields
  • for each pair of \((e_i, g_i)\)
    • Get the bits in scalar \(e_i\) correspondign to the current window. Formally, we have scalar = \((e_i >> M) \mod 2^c\).
    • If scalar \(\ne 0\), we have buckets[scalar-1] += \(p_i\)
  • Initialize tmp = 0
  • For j in \(2^c-1\) to 1:
    tmp= tmp+buckets[j]
  • window_res[w] = tmp

Part II (sum over all windows)

1
2
3
4
5
6
7
lowest = window_res[0]
total = 0
for w in num_window -1 to 1:
total += window_res[w]
for _ in 0..c:
total = total + total // shift left c bits.
total = total + lowest

total is the final result

In Part I, for each window, we need \(N+2^c\) additions. Since we have \(\frac{\lambda}{c}\) windows, we need \((N+2^{\lambda}*\frac{\lambda}{c})\) additions.

In Part II, we need 1 addition and c squarings for each window.
In total, we need \(\frac{\lambda}{c}\cdot(N+2^c+1)\) additions and \(\lambda\) squarings.

In Arkworks implementation, c is set to be \(ln(N)+2\).
For notation simplicity, let’s set c to be \(log2(N)\). Then, the computation complexity of Pippenger’s algorithm is \(\frac{2N\lambda}{log2(N)}+1\) additions and \(\lambda\) squarings
Since \(\lambda\) is a constant for a elliptic curve, people usually say that pippenger has a time complexity of \(O(\frac{N}{log(N)})\)

References

cpp build

cmake

environment variables

  • CMAKE_SOURCE_DIR: the root directory of the source tree for the project. This variable holds the full path to the directory where the top-level CMakeLists.txt file is located.
1

system config

  • /etc/ld.so.conf
    The file /etc/ld.so.conf is a configuration file used by the dynamic linker/loader on Unix-like systems, including Linux. Its purpose is to specify additional directories where the linker should search for shared libraries at runtime.
    When a program starts, the dynamic linker/loader is responsible for resolving and loading shared libraries (also known as dynamic libraries or shared objects). The paths specified in /etc/ld.so.conf help the linker locate these libraries.
    On some systems, you might also find additional configuration files in the /etc/ld.so.conf.d/ directory, each containing a list of directories.
    After modifying the file, you need to run the ldconfig command to update the dynamic linker’s cache and apply the changes.

note: ld stands for linker

tools

nm

nm is a command-line utility on Unix-like operating systems that displays information about the symbols (e.g., functions, variables) contained in object files, executables, and shared libraries. The name “nm” stands for “name list.”

1
nm -D ${path_to_dynamic_library}

The default information that the ‘nm’ command provides is :

  • Virtual address of the symbol
  • A character which depicts the symbol type. If the character is in lower case then the symbol is local but if the character is in upper case then the symbol is external
  • Name of the symbol

The characters that identify symbol type describe :

  • A : Global absolute symbol.
  • a : Local absolute symbol.
  • B : Global bss symbol.
  • b : Local bss symbol.
  • D : Global data symbol.
  • d : Local data symbol.
  • f : Source file name symbol.
  • L : Global thread-local symbol (TLS).
  • l : Static thread-local symbol (TLS).
  • T : Global text symbol.
  • t : Local text symbol.
  • U : Undefined symbol.

objdump

environment variables

  • LD_LIBRARY_PATH
    The LD_LIBRARY_PATH environment variable is used in Unix-like operating systems (including Linux) to specify a list of directories where the system should look for shared libraries before searching the default system locations. This variable is particularly useful when you want to run an executable that depends on shared libraries that are not in standard library paths.
  1. compiling
    1
    g++ -o my_program my_program.cpp -lmy_library -L/path/to/library
  2. running
    1
    2
    export LD_LIBRARY_PATH=/path/to/library:$LD_LIBRARY_PATH
    ./my_program

field

Pseudo-Mersenne Prime

Pseudo-Mersenne Prime is a prime of the form
\[ p = 2^m -k \]
where \(k\) is an integer for which
\[ 0 \lt |k| \lt 2^{\lfloor m/2 \rfloor} \]

if \(k=1\), then \(p\) is a Mersenne prime (and \(m\) must necesarily be a priime). if \(k=-1\); then \(p\) is called a Fermat prime (and \(m\) must necessarily be a power of two)

Pseudo-Mersenne primes are useful in public-key cryptography because they admit fast modular reduction similar to Mersenne primes. If \(n\) is a positive integer less than \(p^2\), then \(n\) can be written as
\[ n = u \cdot 2^{2m} + a \cdot 2^{m} + b \]
where \(u=0\) or \(1\) and \(a\) and \(b\) are nonnegative integers less than \(2^m\). Then
\[ n \equiv u \cdot k^2 + a \cdot k + b \mod p\]

Repeating this substitution a few times will yield \(n \mod p\).

Optimal Extension Fields (OEF)

Optimal extension fields (OEFs) are a family of finite fields with an arithmetic that can be implemented efficiently in software. OEFs are extension fields \(GP(p^m)\) where the prime p is of special form.

An Optimal Extension Field is a finite field \(GF(p^m)\) such that

  1. \(p\) is pseudo Mersenne prime
  2. An irreducible binomial \(P(x) = x^m - \omega\) exists over \(GF(p)\)

other concepts

two adicity

A two-adicity of 32 means that there’s a multiplicative subgroup of size \(2^32\) that exists in the field.
For example

1
2
3
4
const TWO_ADICITY: u32 = 32;
const TWO_ADIC_ROOT_OF_UNITY: BigInteger = BigInteger([
0x218077428c9942de, 0xcc49578921b60494, 0xac2e5d27b2efbee2, 0xb79fa897f2db056
]); // TWO_ADIC_ROOT_OF_UNITY^{2^32} = 1

references

  • Pseudo-Mersenne Prime
  • [Optimal Extension Field] Bailey DV, Paar C (1998) Optimal extension fields for fast arithmetic in public-key algorithms. In: Krawczyk H (ed), Advances in cryptology–CRYPTO ’98, LNCS 1462. Springer, Berlin, pp 472–485

polygon zkEVM

zkProver

State Machines

The zkProver follows modularity of design to the extent that, except for a few components, it is mainly a cluster of State Machines. It has a total of thirteen (13) State Machines;

  • The Main State Machine
  • Secondary State Machines → Binary SM, Storage SM, Memory SM, Arithmetic SM, Keccak Function SM, PoseidonG SM,
  • Auxiliary State Machines → Padding-PG SM, Padding-KK SM, Bits2Field SM, Memory Align SM, Byte4 SM, ROM SM.
    Due to the modular design of zkProver, the Main State Machine can delegate as many of tasks as possible to other specialist State Machines. This heavily improves the efficiency of Main SM.

Two Novel Languages For zkProver

Zero-Knowledge Assembly(zkASM)

As an Assembly language, the Zero-Knowledge Assembly (or zkASM) language is specially designed to map instructions from the zkProver’s Main State Machine to other State Machines. In case of the State Machines with firmware, zkASM is the Interpreter for the firmware.

Polynomial Identity Language(PIL)

The Polynomial Identity Language (or PIL) is especially designed for the zkProver. Almost all state machines express computations in terms of polynomials. Therefore, state transitions in State Machines must satisfy computation-specific polynomial identities.

Microprocessor State Machines

There are two microprocessor-type state machines; the Main SM and the Storage SM. These two State Machines have the Firmware and the Hardware part. It is worth noting that each of these Microprocessor SM has its own ROM.

The Firmware part runs the zkASM language to set up the logic and rules, which are expressed in JSON format and stored in a ROM. The JSON-file is then parsed to the specific SM Executor, which executes Storage Actions in compliance with the rules and logic in the JSON-file.

The Hardware component, which uses the Polynomial Identity Language (PIL), defines constraints (or polynomial identities), expresses them in JSON format, and stores them in the accompanying JSON-file. These constraints are parsed to the specific SM Executor, because all computations must be executed in conformance to the polynomial identities.

montgomery multiplication

montgomery space

Montgomery multiplication works by first transforming the multipliers into Montgomery space, where modular multiplication can be performed cheaply, and then transforming them back when their actual values are needed. Unlike general integer division methods, Montgomery multiplication is not efficient for performing just one modular reduction and only becomes worthwhile when there is a chain of modular operations.

The space is defined by the modulo n and a positive integer r ≥ n coprime to n. The algorithm involves modulo and division by r, so in practice, r is chosen to be 2^32 or 2^64, so that these operations can be done with a right-shift and a bitwise AND respectively.

Definition. the representative \(\bar{x}\) of a number \(x\) in the Montgomery space is defined as
\[ \bar{x} = x \cdot r \mod n \]

Computing this transformation involves a multiplication and a modulo — an expensive operation that we wanted to optimize away in the first place — which is why we only use this method when the overhead of transforming numbers to and from the Montgomery space is worth it and not for general modular multiplication.

Inside the Montgomery space, addition, substraction, and checking for equality is performed as usual:

\[ x \cdot r + y \cdot r \equiv (x+y) \cdot r \mod n \]

However, this is not the case for multiplication. Denoting multiplication in the Montgomery space as \(∗\) and the “normal” multiplication as \(\cdot\), we expect the result to be:
\[ \bar{x} * \bar{y} = \overline{x\cdot y} = (x\cdot y)\cdot r \mod n \]

But the normal multiplication in the Montgomery space yields:
\[ \bar{x} \cdot \bar{y}=(x\cdot y)\cdot r \cdot r \mod n \]

Therefore, the multiplication in the Montgomery space is defined as
\[ \bar{x} * \bar{y} = \bar{x} \cdot \bar{y} \cdot r^{-1} \mod n \]

This means that, after we normally multiply two numbers in the Montgomery space, we need to reduce the result by multiplying it by \(r^{-1}\) and taking the modulo — and there is an efficent way to do this particular operation.

Montgomery reduction

assume that \(r=2^{32}\), the modulo \(n\) is 32-bit, and the number \(x\) we need to reduce is 64-bit(the product of two 32-bit numbers). The goal is to calculate \(y=x\cdot r^{-1} \mod n\).

Since \(r\) is coprime with \(n\), we know that there are two numbers \(r^{-1}\) and \(n’\) in the \([0, n)\) range such that
\[ r \cdot r^{-1} + n \cdot n’ = 1 \]
both \(r^{-1} and \quad n’\) can be computed using extended Euclidean algorithm (\(n’ = n^{-1} \mod r\)).
Using this identiy, we can express \(r \cdot r^{-1}\) as \(1-n\cdot n’\) and write \(x \cdot r^{-1}\) as

\[
\displaylines{
x\cdot r^{-1} = \frac{x \cdot r \cdot r^{-1}}{r} \\
= \frac{x \cdot (1-n\cdot n’)}{r} \\
= \frac{x - x \cdot n \cdot n’}{r} \\
= \frac{x - x \cdot n \cdot n’ + k\cdot r \cdot n}{r} \quad (\mod n) (for \hspace{0.2cm}any\hspace{0.2cm} integer\hspace{0.2cm} k) \\
= \frac{x - (x\cdot n’ - k\cdot r)\cdot n}{r}
}
\]

Now, if we choose \(k\) to be \(\lfloor x\cdot n’/r\rfloor\) (the upper 64 bits of \(x\cdot n’\), giving \(x\) is 64 bits and \(n’\) is 32 bits ), it will cancel out, and \(k\cdot r - x\cdot n’\) will simply be equal to \(x \cdot n’ \mod r\) (the lower 32 bits of \(x\cdot n’\)), implying:
\[ x\cdot r^{-1} = (x - (x\cdot n’ \mod r )\cdot n)/r\]

the algorithm itself just evaluates this formula, performing two multiplications to calculate \(q = x\cdot n’ \mod r\) and \(m = q\cdot n\), and then subtracts it from \(x\) and right shifts the result to divide it by r.

The only remaining thing to handle is that the result may not be in the \([0,n)\) range; but since
\[x < n\cdot n < r\cdot n => x/r < n \]
and
\[m = q\cdot n < r\cdot n => m/r < n\]
it is guaranted that
\[ -n < (x-m)/r < n \]

Therefore, we can simply check if the result is negative and in that case, add \(n\) to it, giving the following algorithm:

1
2
3
4
5
6
7
8
9
10
11
typedef __uint32_t u32;
typedef __uint64_t u64;

const u32 n = 1e9 + 7, nr = inverse(n, 1ull << 32); // nr is n' in the above equations, r is 1<<32

u32 reduce(u64 x) {
u32 q = u32(x) * nr; // q = x * n' mod r
u64 m = (u64) q * n; // m = q * n
u32 y = (x - m) >> 32; // y = (x - m) / r
return x < m ? y + n : y; // if y < 0, add n to make it be in the [0, n) range
}

This last check is relatively cheap, but it is still on the critical path. If we are fine with the result being in the \([0,2\cdot n−2]\) range instead of \([0, n)\), we can remove it and add \(n\) to the result unconditionally:

1
2
3
4
5
6
u32 reduce(u64 x) {
u32 q = u32(x) * nr;
u64 m = (u64) q * n;
u32 y = (x - m) >> 32;
return y + n
}

We can also move the >> 32 operation one step earlier in the computation graph and compute \(\lfloor x/r \rfloor - \lfloor m/r \rfloor\) instead of \( (x-m)/r\). This is correct because the lower 32 bits of \(x\) and \(m\) are equal anyway since
\[ m = x\cdot n’ \cdot n = x \mod r\] (mod r is same as taking the lower 32 bits).

But why would we voluntarily choose to perfom two right-shifts instead of just one? This is beneficial because for ((u64) q * n) >> 32 we need to do a 32-by-32 multiplication and take the upper 32 bits of the result (which the x86 mul instruction already writes in a separate register, so it doesn’t cost anything), and the other right-shift x >> 32 is not on the critical path.

1
2
3
4
5
u32 reduce(u64 x) {
u32 q = u32(x) * nr;
u32 m = ((u64) q * n) >> 32;
return (x >> 32) + n - m;
}

if to reduce a u128 number(u64 * u64). it is as below

1
2
3
4
5
6
7
typedef __uint128_t u128;

u64 reduce(u128 x) const {
u64 q = u64(x) * nr;
u64 m = ((u128) q * n) >> 64;
return (x >> 64) + n - m;
}

Faster Inverse

Montgomery multiplication itself is fast, but it requires some precomputation:

  • inverting \(n\) modulo \(r\) to compute \(n’\)
  • transforming a number to the Montgomery space,
  • transforming a number from the Montgomery space.

The last operation is already efficiently performed with the reduce procedure we just implemented, but the first inverting can be slightly optimized.
Computing the inverse \(n’ = n^{-1} \mod r\) can be done faster than with the extended Euclidean algorith by taking advantage of the fact that \(r\) is a power of two and using the following identity:
\[ a\cdot x = 1 \mod 2^k => a\cdot x\cdot (2-a\cdot x) = 1 \mod 2^{2k} \]
Proof:
\[
\displaylines{
a\cdot x \cdot (2-a\cdot x) = 2\cdot a\cdot x - (a\cdot x)^2 \\
= 2\cdot(1+m\cdot 2^k) - (1+m\cdot 2^k)^2 \\
= 2 + 2\cdot m\cdot 2^k - 1-2\cdot m \cdot 2^k -m^2\cdot 2^{2k} \\
= 1-m^2\cdot 2^{2k} \\
= 1 \mod 2^{2k}
}
\]
note

  • \(a\cdot x \) coudl be represented by \(1+m\cdot 2^k\) as \(a\cdot x = 1 \mod 2^k\)

as for our case, \(x\) is \(nr\), \(a\) is \(n\). i.e \(n\cdot nr = 1 \mod 2^k\). Since \(n \) is an odd number, we can start with \(nr=1\) as the inverse of \(n\) modulo \(2^1\). then applying this identity one time, i.e \(n\cdot 1 \cdot (2-n\cdot 1) = 1 \mod 2^2\). Let \(nr = 1\cdot(2-n\cdot 1) \), we apply this identity again, and get \(n\cdot nr \cdot (2-a\cdot nr) = 1 \mod 2^4\). Following this approach, and apply this identity exactly \(log_2^m\) times, where \(m\) is the bit length of \(r\)

Complement Implementation

1
2


references

kzg polynomial commitment

introduction

KZG polynomial commitments was introduced by Kate, Zaverucha and Goldberg. It is called a commitment, because having sent the commitment value (an elliptic curve point) to someone (the verifier), the prover cannot change the polynomial they are working with.

Comparison to Merkle trees

A Merkle tree is what cryptographers call a vector commitment: Using a Merkle tree of depth \(d\), you can compute a commitment to a vector. A polynomial commitment can be achived by making a merkle tree commitment of all polynomials coefficients. however, it is not efficient.

parings

Let \(\mathbb{G_1}\) and \(\mathbb{G_2}\) be two elliptic curves wit a paring \(e: \mathbb{G_1} \times \mathbb{G_2} \rightarrow \mathbb{G_T}\). Let \(p\) be the order of \(\mathbb{G_1}\) and \(\mathbb{G_2}\), and \(G\) and \(H\) be generators of \(\mathbb{G_1}\) and \(\mathbb{G_2}\). We will use a very useful shorhand notation
\( [x]_1 = xG \in \mathbb{G_1}\), and \([x]_2 = xH \in \mathbb{G_2} \), for any \(x \in \mathbb{F_p}\)

Trusted setup

Let’s assume we have a trusted setup, so that for some secret \(s\), the elements \([s^i]_1\) and \([s^i]_2\) are available to both prover and verifier for \(i=0, …, n-1\).

Let \(p(X) = \sum_{i=0}^{n}p_i X^i \) be a polynomial, the prover can compute

\( \left[ p(s) \right]_1 = p_0[s^0]_1 + p_1[s^1]_1 + … + p_n[s^n]_1 \)

Kate commitment

In the Kate commitment scheme, the eleemnt \(C = [p(s)]_1\) is the commitment to the polynomial \(p(X)\). Could the prover (without knowing \(s\)) find another polynomial \(q(X) \neq p(X)\) that has the same commitment. Let’s assume that this were the case. Then it would mean \([p(s) - q(s)]_1 = 0\), implying \(p(s) - q(s) = 0\).

Now, \(r(X) = p(X) - q(X)\) is itself a polynomial. we know that it’s not constant because \(p(X) \neq q(X) \). It is a well-known fact that any non-constant polynomial of degree \(n\) can have at most \(n\) zeroes.
Since the prover doesn’t know \(s\), the only way they could achieve that \(p(s) - q(s) = 0\)
is by making \( p(X) - q(X)=0\) in as many places as possible. But since they can do that in at most \(n\)
places, as we’ve just proved, they are very unlikely to succeed: since \(n\) is much smaller than the order of the curve, \(p\). the probability that \(s\) will be one of the points they chose to make \(p(X) = q(X)\) will be vanishingly tiny.

Multiplying polynomials

While elliptic curves themselves don’t allow polynomial multiplication, luckily we can do it with pairings: We have that
\[ e([a]_1, [b]_2) = e(G,H)^{(ab)} = [ab]_T\]
while we unfortunately can’t just multiply two field elements inside an elliptic curve and get their product as an elliptic curve element, we can multiply two field elements if we commited to them in different curves (one in \(G_1\) and one in \(G_2\), and the output is a
\(G_T\) element.

Noe let’s say we want to prove that \(p(z)=y\). we will use the polynomial \(p(X) - y\). this polynomial is clearly zero at \(z\). Let \(q(X)\) be the polynomial \(p(X) - y\) divided by the linear factor \(X-z\), i.e
\[ q(X) = \frac{p(X) - y}{X-z} \]
which is equivalent to saying that \(q(X)(X-z) = p(X) - y\)

Kate proofs

now a kate proof for the evaluation \(p(z) = y\) is defined as \(\pi = [q(s)]_1\). remember the commitment to the polynomial \(p(X)\) is defined as \(C=[p(s)]_1\)
the verifier checks this proof using the following equation
\( e(\pi, [s-z]_2) = e(C-[y]_1, H) \)

Note that the verifier can compute \([s-z]_2\), because it is just a combination of the element \([s]_2\)
from the trusted setup and \(z\) is the point at which the polynomial is evaluated.

multiproofs

So far we have shown how to prove an evaluation of a polynomial at a single point. Note that this is already pretty amazing: You can show that some polynomial that could be any degree – say \(2^{28}\) – at some point takes a certain value, by only sending a single group element (that could be 48 bytes, for example in BLS12_381). The toy example of using Merkle trees as polynomial commitments would have needed to send \(2^{28}\) elements – all the coefficients of the polynomial.

Let’s say we have a list of \(k\) points \((z_0, y_0),(z_1, y_1), …, (z_{k-1}, y_{k-1})\). Using lagrange interpolation, we can find polynomial \(I(X)\) goes through all of these points.

Now let’s assume that we know \(p(X)\) goes through all these points. then the polynomial \(p(X) - I(X)\) will clearly be zero at each \(z_0, z_1, …, z_{k-1}\). the zero polynomial is
\(Z(X) = (X-z_0)\cdot (X-z_1)…(X-z_{k-1})\)
Now, we can compute the quotient
\(q(X) = \frac{p(X)-I(X)}{Z(X)}\)
We can now define the Kate multiproof for the evaluations \((z_0, y_0),(z_1, y_1),…, (z_{k-1}, y_{k-1})\): \(\pi = [q(s)]_1\). note that this is still only one group element.

Now, to check this, the verifier will also have to compute the interpolation polynomial \(I(X)\) and the zero polynomial Z(X). Using this, they can compute \([Z(s)]_2\) and \([I(s)]_1\), and thus verify the paring equation
\[ e(\pi, [Z(s)]_2) = e(C-[I(s)]_1, H) \]

Kate as a vector commitment

While the Kate commitment scheme is designed as a polynomial commitment, it actually also makes a really nice vector commitment. Remember that a vector commitment commits to a vector \(a_0, …, a_{n-1}\) and lets you prove that you committed to \(a_i\) for some \(i\). We can reproduce this using the Kate commitment scheme: Let \(p(X)\) be the polynomial that for all \(i\) evaluates as \(p(i)=a_i\).
Now using this polynomial, we can prove any number of elements in the vector using just a single group element! Note how much more efficient (in terms of proof size) this is compared to Merkle trees

references

Dankrad Feist Post