Quantum Kernel Methods

This note describes the quantum kernel classifier and regressor implementations in qml.kernel_methods and qml.kernels.

The models are hybrid quantum-classical kernel estimators:

  • classical data is encoded into a quantum state
  • pairwise quantum state overlaps define a kernel
  • the kernel matrix is passed to a classical estimator
  • prediction is performed classically using the quantum-computed kernel

Unlike the variational quantum classifier, there are no trainable quantum circuit parameters in the current implementation.


Data

For classification, we consider a binary dataset:

\[ \mathcal{D} = \{(x_i, y_i)\}_{i=1}^N \]

where:

  • \(N\) is the number of samples
  • \(x_i \in \mathbb{R}^d\) is the feature vector for sample \(i\)
  • \(y_i \in \{0,1\}\) is the binary label for sample \(i\)
  • \(d\) is the feature dimension

For regression with QuantumKernelRegressor, the same notation is used except that the target is continuous:

  • \(y_i \in \mathbb{R}\) is the regression target for sample \(i\)

In the current implementation:

  • \(d = 2\)
  • the dataset is the two-moons dataset
  • features are standardised before entering the quantum feature map

Let:

\[ x = (x_1, x_2) \]

denote one standardised input sample.


Quantum feature map

A quantum kernel method begins by encoding each classical input \(x\) into a quantum state:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]

where:

  • \(n\) is the number of qubits
  • \(U(x)\) is the feature-map unitary
  • \(|0\rangle^{\otimes n}\) is the initial computational basis state

In the current implementation:

  • \(n = d\)
  • each feature is encoded by a single-qubit \(R_Y\) rotation
  • a short entangling chain is applied afterward

Feature map definition

For an input vector \(x \in \mathbb{R}^n\), the feature map is:

\[ U(x) = U_{\mathrm{ent}} \left( \prod_{j=1}^{n} R_Y(x_j) \right) \]

where:

  • \(x_j\) is the \(j\)th feature
  • \(R_Y(x_j)\) is a rotation about the \(Y\) axis on qubit \(j\)
  • \(U_{\mathrm{ent}}\) is the entangling unitary

The entangling unitary is a nearest-neighbour CNOT chain:

\[ U_{\mathrm{ent}} = \prod_{j=1}^{n-1} \mathrm{CNOT}_{j,j+1} \]

So the encoded quantum state is:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]

Quantum kernel

The kernel between two samples \(x_i\) and \(x_j\) is defined by the squared state overlap:

\[ K(x_i, x_j) = |\langle \phi(x_i) \mid \phi(x_j) \rangle|^2 \]

where:

  • \(K(x_i, x_j)\) is the kernel value between inputs \(x_i\) and \(x_j\)
  • \(|\phi(x_i)\rangle\) and \(|\phi(x_j)\rangle\) are the encoded quantum states

This kernel is:

  • symmetric
  • non-negative
  • bounded in \([0,1]\)

It acts as a similarity measure in the Hilbert space induced by the feature map.


Circuit evaluation of the kernel

The overlap can be computed using a single quantum circuit.

Starting from \(|0\rangle^{\otimes n}\), apply:

  1. \(U(x_i)\)
  2. \(U^\dagger(x_j)\)

The resulting state is:

\[ |\psi(x_i, x_j)\rangle = U^\dagger(x_j)\,U(x_i)\,|0\rangle^{\otimes n} \]

The kernel is then the probability of returning to the all-zero state:

\[ K(x_i, x_j) = |\langle 0^{\otimes n} \mid \psi(x_i, x_j)\rangle|^2 \]

equivalently,

\[ K(x_i, x_j) = |\langle 0^{\otimes n} \mid U^\dagger(x_j)\,U(x_i) \mid 0^{\otimes n}\rangle|^2 \]

In the implementation, the circuit returns the full computational-basis probability vector, and the kernel value is the probability of the zero state:

\[ K(x_i, x_j) = p_{0\cdots0} \]

where \(p_{0\cdots0}\) is the measured probability of bitstring \(00\cdots0\).


Kernel matrix

For a training set:

\[ \{x_i\}_{i=1}^{N_{\mathrm{train}}} \]

the training kernel matrix is:

\[ K^{\mathrm{train}} \in \mathbb{R}^{N_{\mathrm{train}} \times N_{\mathrm{train}}} \]

with entries:

\[ K^{\mathrm{train}}_{ij} = K(x_i, x_j) \]

where:

  • \(N_{\mathrm{train}}\) is the number of training samples
  • \(K^{\mathrm{train}}_{ij}\) is the kernel between training samples \(x_i\) and \(x_j\)

For test samples:

\[ \{x'_a\}_{a=1}^{N_{\mathrm{test}}} \]

the test kernel matrix is:

\[ K^{\mathrm{test}} \in \mathbb{R}^{N_{\mathrm{test}} \times N_{\mathrm{train}}} \]

with entries:

\[ K^{\mathrm{test}}_{aj} = K(x'_a, x_j) \]

where:

  • \(N_{\mathrm{test}}\) is the number of test samples
  • \(x'_a\) is the \(a\)th test input
  • \(x_j\) is the \(j\)th training input

So:

  • the training kernel matrix compares training samples to training samples
  • the test kernel matrix compares test samples to training samples

Classical support vector machine

Once the kernel matrix is computed, classification is performed by a classical SVM using the precomputed kernel.

The classifier decision function takes the form:

\[ f(x) = \sum_{i=1}^{N_{\mathrm{train}}} \alpha_i\,K(x_i, x) + b \]

where:

  • \(\alpha_i\) are learned dual coefficients
  • \(b\) is the bias term
  • \(x_i\) are support vectors from the training set
  • \(K(x_i, x)\) is the kernel between training point \(x_i\) and input \(x\)

The predicted class is determined by the sign of the decision function in the SVM formulation. In the binary setting used here, the classifier outputs labels in \(\{0,1\}\) after fitting on the training labels.

The quantum computer is used only for kernel evaluation. The optimisation of the SVM is entirely classical.


Kernel ridge regression

For regression, the same quantum fidelity kernel can be consumed by kernel ridge regression. Given training targets:

\[ y \in \mathbb{R}^{N_{\mathrm{train}}}, \]

kernel ridge regression solves the regularized dual problem:

\[ \alpha = (K^{\mathrm{train}} + \lambda I)^{-1} y \]

where:

  • \(K^{\mathrm{train}}\) is the quantum kernel matrix on training inputs
  • \(\lambda\) is the ridge regularization strength
  • \(I\) is the identity matrix with shape \(N_{\mathrm{train}} \times N_{\mathrm{train}}\)
  • \(\alpha\) are the learned dual coefficients

Predictions for a test input \(x'\) use its kernel vector against the training set:

\[ \hat{y}(x') = \sum_{i=1}^{N_{\mathrm{train}}} \alpha_i K(x_i, x'). \]

The package exposes this path through qml.kernels.QuantumKernelRegressor and the benchmark model name quantum_kernel_regressor. The quantum part still only computes the feature-map fidelity kernel; target fitting and prediction are classical kernel-ridge operations.


Why kernels are useful

Kernel methods are attractive because they separate two tasks:

  1. define a feature space
  2. fit a classifier in that feature space

The quantum circuit defines the feature space implicitly through the map:

\[ x \mapsto |\phi(x)\rangle \]

The classifier then works with inner products in that space, rather than explicit coordinates.

This means the model can represent nonlinear decision boundaries in the original input space even though the SVM optimisation remains classical.


Accuracy

After fitting the SVM, predictions are made on both train and test sets.

For any evaluation set of size \(M\), let:

  • \(y_i\) be the true label of sample \(i\)
  • \(\hat{y}_i\) be the predicted label of sample \(i\)

Accuracy is:

\[ \mathrm{Accuracy} = \frac{1}{M} \sum_{i=1}^{M} \mathbf{1}\{y_i = \hat{y}_i\} \]

where:

  • \(M\) is the number of evaluated samples
  • \(\mathbf{1}\{\cdot\}\) is the indicator function

The implementation reports:

  • training accuracy
  • test accuracy

Computational structure

Let:

  • \(N_{\mathrm{train}}\) be the number of training samples
  • \(N_{\mathrm{test}}\) be the number of test samples

Then the number of kernel evaluations is:

  • \(N_{\mathrm{train}}^2\) for the training kernel matrix
  • \(N_{\mathrm{test}}N_{\mathrm{train}}\) for the test kernel matrix

This means the current implementation is simple and clear, but not yet optimised for large datasets.

It is appropriate for small educational and research-scale examples.


Current implementation choices

The current quantum kernel implementations are intentionally minimal.

Included

  • binary classification
  • two-dimensional input
  • angle-based feature map
  • nearest-neighbour entangling structure
  • exact statevector simulation
  • precomputed-kernel SVM
  • precomputed-kernel ridge regression
  • train/test accuracy
  • train/test MSE and MAE for regression
  • returned train and test kernel matrices

Relation to the code

The implemented workflow is organised as follows:

  • qml.data prepares the dataset
  • qml.kernel_methods defines the small dataset-backed classifier workflow
  • qml.kernels defines reusable kernel estimators for user-supplied arrays
  • qml.metrics computes accuracy and regression metrics
  • qml.io_utils optionally saves results

So the notebook remains a package client, while the full kernel logic lives in the package.


Summary

The implemented quantum kernel family is defined by:

  1. a feature map \(U(x)\)
  2. encoded states \(|\phi(x)\rangle = U(x)|0\rangle^{\otimes n}\)
  3. a kernel $$ K(x_i,x_j)=|\langle \phi(x_i)\mid\phi(x_j)\rangle|^2 $$
  4. a training kernel matrix and test kernel matrix
  5. a classical SVM classifier or kernel-ridge regressor trained on the precomputed kernel

Formally:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]
\[ K(x_i, x_j) = |\langle \phi(x_i) \mid \phi(x_j) \rangle|^2 \]
\[ K^{\mathrm{train}}_{ij} = K(x_i, x_j) \]
\[ K^{\mathrm{test}}_{aj} = K(x'_a, x_j) \]
\[ f(x) = \sum_{i=1}^{N_{\mathrm{train}}} \alpha_i\,K(x_i, x) + b \]

This is the core quantum kernel workflow used in the repository. The classification and regression variants differ in the classical estimator that consumes the kernel, not in the quantum fidelity-kernel definition.