Quantum Kernel Methods¶

This note describes the quantum kernel classifier implemented in qml.kernel_methods.

The model is a hybrid quantum–classical classifier:

classical data is encoded into a quantum state
pairwise quantum state overlaps define a kernel
the kernel matrix is passed to a classical support vector machine (SVM)
prediction is performed classically using the quantum-computed kernel

Unlike the variational quantum classifier, there are no trainable quantum circuit parameters in the current implementation.

Data¶

We consider a binary classification dataset:

\[ \mathcal{D} = \{(x_i, y_i)\}_{i=1}^N \]

where:

$N$ is the number of samples
$x_i \in \mathbb{R}^d$ is the feature vector for sample $i$
$y_i \in \{0,1\}$ is the binary label for sample $i$
$d$ is the feature dimension

In the current implementation:

$d = 2$
the dataset is the two-moons dataset
features are standardised before entering the quantum feature map

Let:

\[ x = (x_1, x_2) \]

denote one standardised input sample.

Quantum feature map¶

A quantum kernel method begins by encoding each classical input $x$ into a quantum state:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]

where:

$n$ is the number of qubits
$U(x)$ is the feature-map unitary
$|0\rangle^{\otimes n}$ is the initial computational basis state

In the current implementation:

$n = d$
each feature is encoded by a single-qubit $R_Y$ rotation
a short entangling chain is applied afterward

Feature map definition¶

For an input vector $x \in \mathbb{R}^n$, the feature map is:

\[ U(x) = U_{\mathrm{ent}} \left( \prod_{j=1}^{n} R_Y(x_j) \right) \]

where:

$x_j$ is the $j$th feature
$R_Y(x_j)$ is a rotation about the $Y$ axis on qubit $j$
$U_{\mathrm{ent}}$ is the entangling unitary

The entangling unitary is a nearest-neighbour CNOT chain:

\[ U_{\mathrm{ent}} = \prod_{j=1}^{n-1} \mathrm{CNOT}_{j,j+1} \]

So the encoded quantum state is:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]

Quantum kernel¶

The kernel between two samples $x_i$ and $x_j$ is defined by the squared state overlap:

\[ K(x_i, x_j) = |\langle \phi(x_i) \mid \phi(x_j) \rangle|^2 \]

where:

$K(x_i, x_j)$ is the kernel value between inputs $x_i$ and $x_j$
$|\phi(x_i)\rangle$ and $|\phi(x_j)\rangle$ are the encoded quantum states

This kernel is:

symmetric
non-negative
bounded in $[0,1]$

It acts as a similarity measure in the Hilbert space induced by the feature map.

Circuit evaluation of the kernel¶

The overlap can be computed using a single quantum circuit.

Starting from $|0\rangle^{\otimes n}$, apply:

$U(x_i)$
$U^\dagger(x_j)$

The resulting state is:

\[ |\psi(x_i, x_j)\rangle = U^\dagger(x_j)\,U(x_i)\,|0\rangle^{\otimes n} \]

The kernel is then the probability of returning to the all-zero state:

\[ K(x_i, x_j) = |\langle 0^{\otimes n} \mid \psi(x_i, x_j)\rangle|^2 \]

equivalently,

\[ K(x_i, x_j) = |\langle 0^{\otimes n} \mid U^\dagger(x_j)\,U(x_i) \mid 0^{\otimes n}\rangle|^2 \]

In the implementation, the circuit returns the full computational-basis probability vector, and the kernel value is the probability of the zero state:

\[ K(x_i, x_j) = p_{0\cdots0} \]

where $p_{0\cdots0}$ is the measured probability of bitstring $00\cdots0$.

Kernel matrix¶

For a training set:

\[ \{x_i\}_{i=1}^{N_{\mathrm{train}}} \]

the training kernel matrix is:

\[ K^{\mathrm{train}} \in \mathbb{R}^{N_{\mathrm{train}} \times N_{\mathrm{train}}} \]

with entries:

\[ K^{\mathrm{train}}_{ij} = K(x_i, x_j) \]

where:

$N_{\mathrm{train}}$ is the number of training samples
$K^{\mathrm{train}}_{ij}$ is the kernel between training samples $x_i$ and $x_j$

For test samples:

\[ \{x'_a\}_{a=1}^{N_{\mathrm{test}}} \]

the test kernel matrix is:

\[ K^{\mathrm{test}} \in \mathbb{R}^{N_{\mathrm{test}} \times N_{\mathrm{train}}} \]

with entries:

\[ K^{\mathrm{test}}_{aj} = K(x'_a, x_j) \]

where:

$N_{\mathrm{test}}$ is the number of test samples
$x'_a$ is the $a$th test input
$x_j$ is the $j$th training input

So:

the training kernel matrix compares training samples to training samples
the test kernel matrix compares test samples to training samples

Classical support vector machine¶

Once the kernel matrix is computed, classification is performed by a classical SVM using the precomputed kernel.

The classifier decision function takes the form:

\[ f(x) = \sum_{i=1}^{N_{\mathrm{train}}} \alpha_i\,K(x_i, x) + b \]

where:

$\alpha_i$ are learned dual coefficients
$b$ is the bias term
$x_i$ are support vectors from the training set
$K(x_i, x)$ is the kernel between training point $x_i$ and input $x$

The predicted class is determined by the sign of the decision function in the SVM formulation. In the binary setting used here, the classifier outputs labels in $\{0,1\}$ after fitting on the training labels.

The quantum computer is used only for kernel evaluation. The optimisation of the SVM is entirely classical.

Why kernels are useful¶

Kernel methods are attractive because they separate two tasks:

define a feature space
fit a classifier in that feature space

The quantum circuit defines the feature space implicitly through the map:

\[ x \mapsto |\phi(x)\rangle \]

The classifier then works with inner products in that space, rather than explicit coordinates.

This means the model can represent nonlinear decision boundaries in the original input space even though the SVM optimisation remains classical.

Accuracy¶

After fitting the SVM, predictions are made on both train and test sets.

For any evaluation set of size $M$, let:

$y_i$ be the true label of sample $i$
$\hat{y}_i$ be the predicted label of sample $i$

Accuracy is:

\[ \mathrm{Accuracy} = \frac{1}{M} \sum_{i=1}^{M} \mathbf{1}\{y_i = \hat{y}_i\} \]

where:

$M$ is the number of evaluated samples
$\mathbf{1}\{\cdot\}$ is the indicator function

The implementation reports:

training accuracy
test accuracy

Computational structure¶

Let:

$N_{\mathrm{train}}$ be the number of training samples
$N_{\mathrm{test}}$ be the number of test samples

Then the number of kernel evaluations is:

$N_{\mathrm{train}}^2$ for the training kernel matrix
$N_{\mathrm{test}}N_{\mathrm{train}}$ for the test kernel matrix

This means the current implementation is simple and clear, but not yet optimised for large datasets.

It is appropriate for small educational and research-scale examples.

Current implementation choices¶

The current quantum kernel classifier is intentionally minimal.

Included¶

binary classification
two-dimensional input
angle-based feature map
nearest-neighbour entangling structure
exact statevector simulation
precomputed-kernel SVM
train/test accuracy
returned train and test kernel matrices

Relation to the code¶

The implemented workflow is organised as follows:

qml.data prepares the dataset
qml.kernel_methods defines the feature map, kernel circuit, kernel matrices, and SVM workflow
qml.metrics computes accuracy
qml.io_utils optionally saves results

So the notebook remains a package client, while the full kernel logic lives in the package.

Summary¶

The implemented quantum kernel classifier is defined by:

a feature map $U(x)$
encoded states $|\phi(x)\rangle = U(x)|0\rangle^{\otimes n}$
a kernel $$ K(x_i,x_j)=|\langle \phi(x_i)\mid\phi(x_j)\rangle|^2 $$
a training kernel matrix and test kernel matrix
a classical SVM trained on the precomputed kernel

Formally:

\[ |\phi(x)\rangle = U(x)\,|0\rangle^{\otimes n} \]

\[ K(x_i, x_j) = |\langle \phi(x_i) \mid \phi(x_j) \rangle|^2 \]

\[ K^{\mathrm{train}}_{ij} = K(x_i, x_j) \]

\[ K^{\mathrm{test}}_{aj} = K(x'_a, x_j) \]

\[ f(x) = \sum_{i=1}^{N_{\mathrm{train}}} \alpha_i\,K(x_i, x) + b \]

This is the core quantum kernel workflow used in the repository.