---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.11.1
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# What is the theory behind PEC?


Probabilistic error cancellation (PEC) {cite}`Temme_2017_PRL, Sun_2021_PRAppl, Shuaining_2020_NatComm`  is a noise-aware error mitigation technique which is
based on two main ideas:

- The first idea is to express ideal gates as linear combinations of implementable noisy gates.
These linear combinations are called quasi-probability representations {cite}`Pashayan_2015_PRL`;

- The second idea is to probabilistically sample from the previous quasi-probability representations to approximate
quantum expectation values via a Monte Carlo average.

**Note:* In this section we follow the same notation of {cite}`Mari_2021_PRA`.*

## Quasi-probability representations

In PEC, each ideal gate $\mathcal G_i$ of a circuit of interest
$\mathcal U = {\mathcal G}_t \circ  \dots \circ {\mathcal G}_2 \circ {\mathcal G}_1 $
is represented as a linear combination of noisy implementable operations ${\mathcal O_{i, \alpha}}$ (i.e., operations that
can be directly applied with a noisy backend):

$$
\mathcal G_i = \sum_\alpha \eta_{i, \alpha} \mathcal O_{i, \alpha},
\quad  \eta_{i, \alpha} \in \mathbb R,
$$

where the calligraphic symbols ($\mathcal U$, $\mathcal G_i$, $\mathcal O_{i, \alpha}$) stand for super-operators acting
on the density matrix of the qubits as linear quantum channels.

The real coefficients ${\eta_{i,\alpha}}$ form a quasi-probability distribution {cite}`Pashayan_2015_PRL` with respect to the
index $\alpha$. Their sum is normalized but, differently from standard probabilities, they can take negative values:

 $$ \sum_\alpha \eta_{i,\alpha}=1,  \qquad  \gamma_i = \sum_\alpha |\eta_{i, \alpha}| \ge 1.$$

The constant $\gamma_i$ quantifies the negativity of the quasi-probability distribution which is directly related
to the error mitigation cost associated to the gate $\mathcal G_i$.

***Note:** In principle, the gate index "$i$" in the noisy operations $\mathcal O_{i, \alpha}$ could be dropped.
However, we keep it to explicitly define gate-dependent basis of implementable operations, consistently with
the  structure of the `OperationRepresentation` class discussed in [What additional options are
available in PEC?](pec-3-options.md).*


## Error cancellation

The aim of PEC is estimating the ideal expectation value of some observable $A=A^\dagger$ with respect to
the quantum state prepared by an ideal circuit of interest $\mathcal U$ acting on some initial  reference
state $\rho_0$ (typically $\rho_0= |0\dots 0 \rangle \langle 0 \dots 0 |$).

Replacing each gate $\mathcal G_i$ with its noisy representation, we can express the ideal expectation
value as a linear combination of noisy expectation values:

$$
\langle A \rangle_{\rm ideal}= {\rm tr}[A \mathcal U (\rho_0)] =
\sum_{\vec{\alpha}} \eta_{\vec{\alpha}} \langle A_{\vec{\alpha}}\rangle_{\rm noisy}
$$

where we introduced the multi-index $\vec{\alpha}=(\alpha_1, \alpha_2, \dots ,\alpha_t)$ and

$$
\eta_{\vec{\alpha}} := \prod_{i=1}^t \eta_{i, \alpha_i},
\quad  \langle A_{\vec{\alpha}}\rangle_{\rm noisy} :=  {\rm tr}[A \Phi_{\vec{\alpha}}(\rho_0)],
\quad \Phi_{\vec{\alpha}} := \mathcal O_{t, \alpha_t} \circ \dots \circ \mathcal O_{2, \alpha_2} \circ \mathcal O_{1, \alpha_1}.
$$


The coefficients $\{ \eta_{\vec{\alpha}} \}$ form a quasi-probability distribution
for the global circuit over the noisy circuits. Indeed it is easy to check that, at the level of super-operators,
we have:

$$ \mathcal U =  \sum_{\vec{\alpha}} \eta_{\vec{\alpha}} \Phi_{\vec{\alpha}}. $$

The one-norm $\gamma$ of the global quasi-probability distribution is the product of those of the gates:

$$
\sum_{\vec \alpha} \eta_{\vec{\alpha}}=1,  \qquad  \gamma = \sum_{\vec{\alpha}} |\eta_{\vec \alpha}| = \prod_{i=1}^{t} \gamma_i.
$$

All the noisy expectation values $\langle A_{\vec{\alpha}}\rangle_{\rm noisy}$ can be directly measured with
a noisy backend since they only require circuits composed of implementable noisy operations.
In principle, by combining all the noisy expectation values, one could compute the ideal result $\langle A \rangle_{\rm ideal}$.
Unfortunately this approach requires executing a number of circuits which grows exponentially with the circuit depth and
which is typically unfeasible.

An important fact at the basis of PEC is that, for weak noise, only a small number of noisy expectation values actually
contribute to the linear combination because many of the coefficients $\eta_{\vec \alpha}$ are negligible.
For this reason, it is more efficient to estimate $\langle A \rangle_{\rm ideal}$ using an
 [importance-sampling](https://en.wikipedia.org/wiki/Importance_sampling) Monte Carlo approach as described in the next section.

## Monte Carlo estimation

To apply a Monte Carlo estimation, we need to replace quasi-probabilities with positive probabilities.
This can be achieved as follows:

$$ \mathcal{G_i} = \sum_{\alpha} \eta_{i, \alpha} \mathcal{O}_{i, \alpha}
= \gamma_i \sum_{\alpha} p_i(\alpha) \, {\rm sgn}(\eta_{i, \alpha})\, \mathcal{O}_{i, \alpha},$$

where $p_{i}(\alpha)=|\eta_{i, \alpha}|/\gamma_i$ is a valid probability distribution with respect to $\alpha$.

If for each gate $\mathcal G_i$ of the circuit we sample a value of $\alpha$ from $p_{i}(\alpha)$ and we apply the corresponding noisy operation
$\mathcal O_{i, \alpha}$, we are effectively sampling a noisy circuit $\Phi_{\vec{\alpha}}$ from the
global probability distribution $p(\vec{\alpha})= |\eta_{\vec{\alpha}}| / \gamma$.

Therefore, at the level of quantum channels, we have:

$$ \mathcal U = \gamma \mathbb E \left\{  {\rm sgn}(\eta_{i, \vec{\alpha}}) \Phi_{\vec{\alpha}} \right\},$$

where $\mathbb E$ is the sample average over many repetitions of the previous probabilistic procedure and
${\rm sgn}(\eta_{\vec{ \alpha}}) = \prod_i {\rm sgn}(\eta_{i, \alpha})$.
As a direct consequence, we can express the ideal expectation value as follows:

$$\langle A \rangle_{\text{ideal}} = \gamma\,
\mathbb E \left\{  {\rm sgn}(\eta_{\vec{\alpha}}) \langle A_{\vec{\alpha}}\rangle_{\rm noisy} \right\}.$$

By averaging a finite number $N$ of samples we obtain an unbiased estimate of $\langle A \rangle_{\text{ideal}}$.
Assuming a bounded observable $|A|\le 1$, the number of samples $N$
necessary to approximate $\langle A\rangle_{\text{ideal}}$ within an absolute error $\delta$,
scales as {cite}`Takagi_2020_PRR`:

$$ N \propto \frac{\gamma^2}{\delta^2}. $$

The term $\delta^2$ in the denominator is due to the stochastic nature of quantum measurements
and is present even when directly estimating an expectation value without error mitigation.
The $\gamma^2$ factor instead represents the sampling overhead associated to PEC.
For weak noise and short circuits, $\gamma$ is typically small and PEC is applicable with a reasonable
cost.
On the contrary, if a circuit is too noisy or too deep, the value of $\gamma$ can be so large that PEC becomes
unfeasible.