ZNE with Qiskit: Layerwise folding#

This tutorial shows an example of how to mitigate noise on IBMQ backends using layerwise folding in contrast with global folding.

One may ask why folding by layer is potentially beneficial to consider. One reason is that applying global folding will increase the length of the entire circuit while layerwise folding on a subset of only the noisiest layers will increase the circuit by a smaller factor.

If running a circuit on hardware is bottle-necked by the cost of running a long circuit, this technique could potentially be used to arrive at a better result (although not as good as global folding) but with less monetary cost.

More information on the layerwise folding technique can be found in Calderon et al. Quantum (2023) [3].

Setup#

from typing import Dict, List, Optional
import numpy as np
import os
import cirq
import qiskit
import matplotlib.pyplot as plt

from mitiq import zne
from mitiq.zne.scaling.layer_scaling import layer_folding, get_layer_folding
from mitiq.interface.mitiq_qiskit.qiskit_utils import initialized_depolarizing_noise
from mitiq.interface.mitiq_qiskit.conversions import to_qiskit
from mitiq.interface.mitiq_cirq.cirq_utils import sample_bitstrings

from cirq.contrib.svg import SVGCircuit
from qiskit_aer import QasmSimulator

# Default to a simulator.
noise_model = initialized_depolarizing_noise(noise_level=0.02)
backend = QasmSimulator(noise_model=noise_model)

shots = 10_000

Helper functions#

The following function will return a list of circuits where the ith element in the list is a circuit with layer “i” folded num_folds number of times. This will be useful when analyzing how much folding increases the noise on a given layer.

def apply_num_folds_to_all_layers(circuit: cirq.Circuit, num_folds: int = 1) -> List[cirq.Circuit]:
    """List of circuits where ith circuit is folded `num_folds` times."""
    return [
        layer_folding(circuit, [0] * i + [num_folds] + [0] * (len(circuit) - i))
        for i in range(len(circuit))
    ]

For instance, consider the following circuit.

# Define a basic circuit for
q0, q1 = cirq.LineQubit.range(2)
circuit = cirq.Circuit(
    [cirq.ops.H(q0)],
    [cirq.ops.CNOT(q0, q1)],
    [cirq.measure(cirq.LineQubit(0))],
)
print(circuit)
0: ───H───@───M───
          │
1: ───────X───────

Let us invoke the apply_num_folds_to_all_layers function as follows.

folded_circuits = apply_num_folds_to_all_layers(circuit, num_folds=2)

Note that the first element of the list is the circuit with the first layer of the circuit folded twice.

print(folded_circuits[0])
0: ───H───H───H───H───H───@───M───
                          │
1: ───────────────────────X───────

Similarly, the second element of the list is the circuit with the second layer folded.

print(folded_circuits[1])
0: ───H───@───@───@───@───@───M───
          │   │   │   │   │
1: ───────X───X───X───X───X───────

Define circuit to analyze#

We will use the following circuit to analyze, but of course, you could use other circuits here as well.

circuit = cirq.Circuit([cirq.X(cirq.LineQubit(0))] * 10, cirq.measure(cirq.LineQubit(0)))
print(circuit)
0: ───X───X───X───X───X───X───X───X───X───X───M───

Total variational distance metric#

An \(i\)-inversion can be viewed as a local perturbation of the circuit. We want to define some measure by which we can determine how much such a perturbation affects the outcome.

Define the quantity:

\[ p(k|C) = \langle \langle k | C | \rho_0 \rangle \rangle \]

as the probability distribution over measurement outcomes at the output of a circuit \(C\) where \(k \in B^n\) with \(B^n\) being the set of all \(n\)-length bit strings where \(\langle \langle k |\) is the vectorized POVM element that corresponds to measuring bit string \(k\).

The impact of applying an inversion is given by

\[ d \left[p(\cdot|C), p(\cdot|C^{(i)})\right] \]

where \(d\) is some distance measure. In Calderon et al. Quantum (2023) [3] the authors used the total variational distance (TVD) measure where

\[ \eta^{(i)} := \frac{1}{2} \sum_{k} |p(k|C) - p(k|C^{(i)})|. \]
def tvd(circuit: cirq.Circuit, num_folds: int = 1, shots: int = 10_000) -> List[float]:
    """Compute the total variational distance (TVD) between ideal circuit and folded circuit(s)."""
    circuit_dist = sample_bitstrings(circuit=circuit, shots=shots).prob_distribution()

    folded_circuits = apply_num_folds_to_all_layers(circuit, num_folds)

    distances: Dict[int, float] = {}
    for i, folded_circuit in enumerate(folded_circuits):
        folded_circuit_dist = sample_bitstrings(circuit=folded_circuit, shots=shots).prob_distribution()

        res: float = 0.0
        for bitstring in circuit_dist.keys():
            res += np.abs(circuit_dist[bitstring] - folded_circuit_dist[bitstring])
        distances[i] = res / 2

    return distances

Impact of single vs. multiple folding#

We can plot the impact of applying layer inversions to the circuit.

def plot_single_vs_multiple_folding(circuit: cirq.Circuit) -> None:
    """Plot how single vs. multiple folding impact the error at a given layer."""
    single_tvd = tvd(circuit, num_folds=1).values()
    multiple_tvd = tvd(circuit, num_folds=5).values()

    labels = [f"L{i}" for i in range(len(circuit))]
    x = np.arange(len(labels))  # the label locations
    width = 0.35  # the width of the bars

    fig, ax = plt.subplots()
    rects1 = ax.bar(x - width/2, single_tvd, width, label="single")
    rects2 = ax.bar(x + width/2, multiple_tvd, width, label="multiple")

    # Add some text for labels, title and custom x-axis tick labels, etc.
    ax.set_xlabel(r"$L_{G_i \theta_i}$")
    ax.set_ylabel(r"$\eta^{(i)}$")
    ax.set_title("Single vs. multiple folding")
    ax.set_xticks(x, labels, rotation=60)

    ax.legend()

    ax.bar_label(rects1, padding=3)
    ax.bar_label(rects2, padding=3)

    fig.tight_layout()

    plt.show()
plot_single_vs_multiple_folding(circuit)
../_images/9c4b94cb3be99c1471c6fcd29dc3c891ff8ef7ebffbe9fb355daa3b2dc91f758.png

As can be seen, the amount of noise on each layer is increased if the number of folds on that layer are increased.

Executor#

Next, we define an executor function that will allow us to run our experiment

def executor(circuit: cirq.Circuit, shots: int = 10_000) -> float:
    """Returns the expectation value to be mitigated.

    Args:
        circuit: Circuit to run.
        shots: Number of times to execute the circuit to compute the expectation value.
    """
    qiskit_circuit = to_qiskit(circuit)

    # Transpile the circuit so it can be properly run
    exec_circuit = qiskit.transpile(
        qiskit_circuit,
        backend=backend,
        basis_gates=noise_model.basis_gates if noise_model else None,
        optimization_level=0, # Important to preserve folded gates.
    )
    # Run the circuit
    job = backend.run(exec_circuit, shots=shots)

    # Convert from raw measurement counts to the expectation value
    counts = job.result().get_counts()

    expectation_value = 0.0 if counts.get("0") is None else counts.get("0") / shots
    return expectation_value

Global folding with linear extrapolation#

First, for comparison, we apply ZNE with global folding on the entire circuit. We then compare the mitigated result of applying ZNE with linear extrapolation against the unmitigated value.

unmitigated = executor(circuit)

linear_factory = zne.inference.LinearFactory(scale_factors=[1.0, 1.5, 2.0, 2.5, 3.0])
mitigated = zne.execute_with_zne(circuit, executor, factory=linear_factory)

print(f"Unmitigated result {unmitigated:.3f}")
print(f"Mitigated result {mitigated:.3f}")
Unmitigated result 0.907
Mitigated result 0.976

Layerwise folding with linear extrapolation#

For contrast, we apply layerwise folding on only the layer with the most noise and use linear extrapolation. As above, we compare the mitigated and unmitigated values.

# Calculate the TVDs of each layer in the circuit (with `num_folds=3`):
tvds = tvd(circuit, num_folds=3)

# Fold noisiest layer only.
layer_to_fold = max(tvds, key=tvds.get)
fold_layer_func = zne.scaling.get_layer_folding(layer_to_fold)

mitigated = zne.execute_with_zne(circuit, executor, scale_noise=fold_layer_func, factory=linear_factory)
print(f"Mitigated (layerwise folding) result {mitigated:.3f}")
print(f"Unmitigated result {unmitigated:.3f}")
Mitigated (layerwise folding) result 0.920
Unmitigated result 0.907

Note

While doing layerwise folding on the noisiest layer will, on average, improve the mitigated value, it still will not eclipse the benefit of doing global folding.