# What is the theory behind PEC?#

Probabilistic error cancellation (PEC) [4, 7, 9] is a noise-aware error mitigation technique which is based on two main ideas:

• The first idea is to express ideal gates as linear combinations of implementable noisy gates. These linear combinations are called quasi-probability representations ;

• The second idea is to probabilistically sample from the previous quasi-probability representations to approximate quantum expectation values via a Monte Carlo average.

Note: In this section we follow the same notation of .

## Quasi-probability representations#

In PEC, each ideal gate $$\mathcal G_i$$ of a circuit of interest $$\mathcal U = {\mathcal G}_t \circ \dots \circ {\mathcal G}_2 \circ {\mathcal G}_1$$ is represented as a linear combination of noisy implementable operations $${\mathcal O_{i, \alpha}}$$ (i.e., operations that can be directly applied with a noisy backend):

$\mathcal G_i = \sum_\alpha \eta_{i, \alpha} \mathcal O_{i, \alpha}, \quad \eta_{i, \alpha} \in \mathbb R,$

where the calligraphic symbols ($$\mathcal U$$, $$\mathcal G_i$$, $$\mathcal O_{i, \alpha}$$) stand for super-operators acting on the density matrix of the qubits as linear quantum channels.

The real coefficients $${\eta_{i,\alpha}}$$ form a quasi-probability distribution  with respect to the index $$\alpha$$. Their sum is normalized but, differently from standard probabilities, they can take negative values:

$\sum_\alpha \eta_{i,\alpha}=1, \qquad \gamma_i = \sum_\alpha |\eta_{i, \alpha}| \ge 1.$

The constant $$\gamma_i$$ quantifies the negativity of the quasi-probability distribution which is directly related to the error mitigation cost associated to the gate $$\mathcal G_i$$.

Note: In principle, the gate index “$$i$$” in the noisy operations $$\mathcal O_{i, \alpha}$$ could be dropped. However, we keep it to explicitly define gate-dependent basis of implementable operations, consistently with the structure of the OperationRepresentation class discussed in What additional options are available in PEC?.

## Error cancellation#

The aim of PEC is estimating the ideal expectation value of some observable $$A=A^\dagger$$ with respect to the quantum state prepared by an ideal circuit of interest $$\mathcal U$$ acting on some initial reference state $$\rho_0$$ (typically $$\rho_0= |0\dots 0 \rangle \langle 0 \dots 0 |$$).

Replacing each gate $$\mathcal G_i$$ with its noisy representation, we can express the ideal expectation value as a linear combination of noisy expectation values:

$\langle A \rangle_{\rm ideal}= {\rm tr}[A \mathcal U (\rho_0)] = \sum_{\vec{\alpha}} \eta_{\vec{\alpha}} \langle A_{\vec{\alpha}}\rangle_{\rm noisy}$

where we introduced the multi-index $$\vec{\alpha}=(\alpha_1, \alpha_2, \dots ,\alpha_t)$$ and

$\eta_{\vec{\alpha}} := \Pi_{i=1}^t \eta_{i, \alpha_i}, \quad \langle A_{\vec{\alpha}}\rangle_{\rm noisy} := {\rm tr}[A \Phi_{\vec{\alpha}}(\rho_0)], \quad \Phi_{\vec{\alpha}} := \mathcal O_{t, \alpha_t} \circ \dots \circ \mathcal O_{2, \alpha_2} \circ \mathcal O_{1, \alpha_1}.$

The coefficients $$\{ \eta_{\vec{\alpha}} \}$$ form a quasi-probability distribution for the global circuit over the noisy circuits. Indeed it is easy to check that, at the level of super-operators, we have:

$\mathcal U = \sum_{\vec{\alpha}} \eta_{\vec{\alpha}} \Phi_{\vec{\alpha}}.$

The one-norm $$\gamma$$ of the global quasi-probability distribution is the product of those of the gates:

$\sum_{\vec \alpha} \eta_{\vec{\alpha}}=1, \qquad \gamma = \sum_{\vec{\alpha}} |\eta_{\vec \alpha}| = \Pi_{i=1}^{t} \gamma_i.$

All the noisy expectation values $$\langle A_{\vec{\alpha}}\rangle_{\rm noisy}$$ can be directly measured with a noisy backend since they only require circuits composed of implementable noisy operations. In principle, by combining all the noisy expectation values, one could compute the ideal result $$\langle A \rangle_{\rm ideal}$$. Unfortunately this approach requires executing a number of circuits which grows exponentially with the circuit depth and which is typically unfeasible.

An important fact at the basis of PEC is that, for weak noise, only a small number of noisy expectation values actually contribute to the linear combination because many of the coefficients $$\eta_{\vec \alpha}$$ are negligible. For this reason, it is more efficient to estimate $$\langle A \rangle_{\rm ideal}$$ using an importance-sampling Monte Carlo approach as described in the next section.

## Monte Carlo estimation#

To apply a Monte Carlo estimation, we need to replace quasi-probabilities with positive probabilities. This can be achieved as follows:

$\mathcal{G_i} = \sum_{\alpha} \eta_{i, \alpha} \mathcal{O}_{i, \alpha} = \gamma_i \sum_{\alpha} p_i(\alpha) \, {\rm sgn}(\eta_{i, \alpha})\, \mathcal{O}_{i, \alpha},$

where $$p_{i}(\alpha)=|\eta_{i, \alpha}|/\gamma_i$$ is a valid probability distribution with respect to $$\alpha$$.

If for each gate $$\mathcal G_i$$ of the circuit we sample a value of $$\alpha$$ from $$p_{i}(\alpha)$$ and we apply the corresponding noisy operation $$\mathcal O_{i, \alpha}$$, we are effectively sampling a noisy circuit $$\Phi_{\vec{\alpha}}$$ from the global probability distribution $$p(\vec{\alpha})= |\eta_{\vec{\alpha}}| / \gamma$$.

Therefore, at the level of quantum channels, we have:

$\mathcal U = \gamma \mathbb E \left\{ {\rm sgn}(\eta_{i, \vec{\alpha}}) \Phi_{\vec{\alpha}} \right\},$

where $$\mathbb E$$ is the sample average over many repetitions of the previous probabilistic procedure and $${\rm sgn}(\eta_{\vec{ \alpha}}) = \Pi_i {\rm sgn}(\eta_{i, \alpha})$$. As a direct consequence, we can express the ideal expectation value as follows:

$\langle A \rangle_{\text{ideal}} = \gamma\, \mathbb E \left\{ {\rm sgn}(\eta_{\vec{\alpha}}) \langle A_{\vec{\alpha}}\rangle_{\rm noisy} \right\}.$

By averaging a finite number $$N$$ of samples we obtain an unbiased estimate of $$\langle A \rangle_{\text{ideal}}$$. Assuming a bounded observable $$|A|\le 1$$, the number of samples $$N$$ necessary to approximate $$\langle A\rangle_{\text{ideal}}$$ within an absolute error $$\delta$$, scales as :

$N \propto \frac{\gamma^2}{\delta^2}.$

The term $$\delta^2$$ in the denominator is due to the stochastic nature of quantum measurements and is present even when directly estimating an expectation value without error mitigation. The $$\gamma^2$$ factor instead represents the sampling overhead associated to PEC. For weak noise and short circuits, $$\gamma$$ is typically small and PEC is applicable with a reasonable cost. On the contrary, if a circuit is too noisy or too deep, the value of $$\gamma$$ can be so large that PEC becomes unfeasible.