# What is the theory behind PEC?#

Probabilistic error cancellation (PEC) [4, 7, 9] is a noise-aware error mitigation technique which is based on two main ideas:

The first idea is to express ideal gates as linear combinations of implementable noisy gates. These linear combinations are called quasi-probability representations [10];

The second idea is to probabilistically sample from the previous quasi-probability representations to approximate quantum expectation values via a Monte Carlo average.

*Note:* In this section we follow the same notation of [1].

## Quasi-probability representations#

In PEC, each ideal gate \(\mathcal G_i\) of a circuit of interest \(\mathcal U = {\mathcal G}_t \circ \dots \circ {\mathcal G}_2 \circ {\mathcal G}_1 \) is represented as a linear combination of noisy implementable operations \({\mathcal O_{i, \alpha}}\) (i.e., operations that can be directly applied with a noisy backend):

where the calligraphic symbols (\(\mathcal U\), \(\mathcal G_i\), \(\mathcal O_{i, \alpha}\)) stand for super-operators acting on the density matrix of the qubits as linear quantum channels.

The real coefficients \({\eta_{i,\alpha}}\) form a quasi-probability distribution [10] with respect to the index \(\alpha\). Their sum is normalized but, differently from standard probabilities, they can take negative values:

The constant \(\gamma_i\) quantifies the negativity of the quasi-probability distribution which is directly related to the error mitigation cost associated to the gate \(\mathcal G_i\).

**Note:** In principle, the gate index “\(i\)” in the noisy operations \(\mathcal O_{i, \alpha}\) could be dropped.
However, we keep it to explicitly define gate-dependent basis of implementable operations, consistently with
the structure of the `OperationRepresentation`

class discussed in What additional options are
available in PEC?.

## Error cancellation#

The aim of PEC is estimating the ideal expectation value of some observable \(A=A^\dagger\) with respect to the quantum state prepared by an ideal circuit of interest \(\mathcal U\) acting on some initial reference state \(\rho_0\) (typically \(\rho_0= |0\dots 0 \rangle \langle 0 \dots 0 |\)).

Replacing each gate \(\mathcal G_i\) with its noisy representation, we can express the ideal expectation value as a linear combination of noisy expectation values:

where we introduced the multi-index \(\vec{\alpha}=(\alpha_1, \alpha_2, \dots ,\alpha_t)\) and

The coefficients \(\{ \eta_{\vec{\alpha}} \}\) form a quasi-probability distribution for the global circuit over the noisy circuits. Indeed it is easy to check that, at the level of super-operators, we have:

The one-norm \(\gamma\) of the global quasi-probability distribution is the product of those of the gates:

All the noisy expectation values \(\langle A_{\vec{\alpha}}\rangle_{\rm noisy}\) can be directly measured with a noisy backend since they only require circuits composed of implementable noisy operations. In principle, by combining all the noisy expectation values, one could compute the ideal result \(\langle A \rangle_{\rm ideal}\). Unfortunately this approach requires executing a number of circuits which grows exponentially with the circuit depth and which is typically unfeasible.

An important fact at the basis of PEC is that, for weak noise, only a small number of noisy expectation values actually contribute to the linear combination because many of the coefficients \(\eta_{\vec \alpha}\) are negligible. For this reason, it is more efficient to estimate \(\langle A \rangle_{\rm ideal}\) using an importance-sampling Monte Carlo approach as described in the next section.

## Monte Carlo estimation#

To apply a Monte Carlo estimation, we need to replace quasi-probabilities with positive probabilities. This can be achieved as follows:

where \(p_{i}(\alpha)=|\eta_{i, \alpha}|/\gamma_i\) is a valid probability distribution with respect to \(\alpha\).

If for each gate \(\mathcal G_i\) of the circuit we sample a value of \(\alpha\) from \(p_{i}(\alpha)\) and we apply the corresponding noisy operation \(\mathcal O_{i, \alpha}\), we are effectively sampling a noisy circuit \(\Phi_{\vec{\alpha}}\) from the global probability distribution \(p(\vec{\alpha})= |\eta_{\vec{\alpha}}| / \gamma\).

Therefore, at the level of quantum channels, we have:

where \(\mathbb E\) is the sample average over many repetitions of the previous probabilistic procedure and \({\rm sgn}(\eta_{\vec{ \alpha}}) = \Pi_i {\rm sgn}(\eta_{i, \alpha})\). As a direct consequence, we can express the ideal expectation value as follows:

By averaging a finite number \(N\) of samples we obtain an unbiased estimate of \(\langle A \rangle_{\text{ideal}}\). Assuming a bounded observable \(|A|\le 1\), the number of samples \(N\) necessary to approximate \(\langle A\rangle_{\text{ideal}}\) within an absolute error \(\delta\), scales as [8]:

The term \(\delta^2\) in the denominator is due to the stochastic nature of quantum measurements and is present even when directly estimating an expectation value without error mitigation. The \(\gamma^2\) factor instead represents the sampling overhead associated to PEC. For weak noise and short circuits, \(\gamma\) is typically small and PEC is applicable with a reasonable cost. On the contrary, if a circuit is too noisy or too deep, the value of \(\gamma\) can be so large that PEC becomes unfeasible.