# What is the theory behind CDR?#

Clifford data regression (CDR) is a quantum error mitigation technique that has been introduced in Ref. [15] and extended to variable-noise CDR in Ref. [2]. . The presented error mitigation (EM) strategy is designed for gate-based quantum computers. This method primarily consists of creating a training data set \(\{(X_{\phi_i}^{\text{error}}, X_{\phi_i}^{\text{exact}})\}\), where \(X_{\phi_i}^{\text{error}}\) and \(X_{\phi_i}^{\text{exact}}\) are the expectation values of an observable \(X\) for a state \(|\phi_i\rangle\) under error and error-free conditions, respectively.

This method includes the following steps:

## Step 1: Choose Near-Clifford Circuits for Training#

Near-Clifford circuits are selected due to their capability to be efficiently simulated classically, and are denoted by \(S_\psi=\{|\phi_i\rangle\}_i\).

## Step 2: Construct the Training Set#

The training set \(\{(X_{\phi_i}^{\text{error}}, X_{\phi_i}^{\text{exact}})\}_i\) is constructed by calculating the expectation values of \(X\) for each state \(|\phi_i\rangle\) in \(S_\psi\), on both a quantum computer (to obtain \(X_{\phi_i}^{\text{error}}\)) and a classical computer (to obtain \(X_{\phi_i}^{\text{exact}}\)).

## Step 3: Learn the Error Mitigation Model#

A model \(f(X^{\text{error}}, a)\) for \(X^{exact}\) is defined and learned. Here, \(a\) is the set of parameters to be determined. This is achieved by minimizing the distance between the training set, as expressed by the following optimization problem:

In this expression, \(a_{opt}\) are the parameters that minimize the cost function.

## Step 4: Apply the Error Mitigation Model#

Finally, the learned model \(f(X^{\text{error}}, a_{opt})\) is used to correct the expectation values of \(X\) for new quantum states, expressed as \(X_\psi^{\text{exact}} = f(X_\psi^{\text{error}}, a_{opt})\).

The effectiveness of this method has been proven on circuits with up to 64 qubits and for tasks such as estimating ground-state energies. However, its performance is dependent on the task, the system, the quality of the training data, and the choice of model.