Benchmark Utilities¶
The qml.benchmarks module provides helpers for comparing quantum and classical models across multiple random seeds.
Benchmarking enables:
- reproducible evaluation of model performance
- comparison between quantum and classical approaches
- estimation of performance variability due to stochastic training effects
- consistent experiment logging
Both classification and regression workflows are supported.
Overview¶
Benchmark functions run multiple training jobs using different random seeds and aggregate performance metrics.
Typical workflow:
- choose models to compare
- run multiple seeds
- compute mean and standard deviation of metrics
- optionally save results
Example metrics include:
- classification accuracy
- regression MSE / MAE
- final loss values
- variability across seeds
Results are returned as structured dictionaries and can optionally be saved to JSON.
Classification Benchmarks¶
Compare multiple classifiers on the same dataset.
Supported models:
vqcqcnnquantum_kernellogistic_regressionsvm_classifiermlp_classifier
Example:
from qml.benchmarks import compare_classification_models
result = compare_classification_models(
models=["vqc", "qcnn", "quantum_kernel", "svm_classifier"],
seeds=[0, 1, 2, 3],
n_samples=200,
noise=0.1,
)
Returned structure:
{
"benchmark_type": "classification",
"models": [...],
"runs": [...],
"summary": {
"vqc": {
"train_accuracy": {"mean": ..., "std": ...},
"test_accuracy": {"mean": ..., "std": ...},
"n_runs": 4
}
}
}
Each run record includes:
{
"model": "vqc",
"seed": 0,
"train_accuracy": ...,
"test_accuracy": ...,
"final_loss": ...
}
Regression Benchmarks¶
Compare regression models on the same dataset.
Supported models:
vqrridge_regressionmlp_regressor
Example:
from qml.benchmarks import compare_regression_models
result = compare_regression_models(
models=["vqr", "ridge_regression"],
seeds=[0, 1, 2],
n_samples=200,
noise=0.1,
)
Returned structure:
{
"benchmark_type": "regression",
"summary": {
"vqr": {
"train_mse": {"mean": ..., "std": ...},
"test_mse": {"mean": ..., "std": ...},
"train_mae": {"mean": ..., "std": ...},
"test_mae": {"mean": ..., "std": ...},
"n_runs": 3
}
}
}
Each run record includes:
{
"model": "vqr",
"seed": 0,
"train_mse": ...,
"test_mse": ...,
"train_mae": ...,
"test_mae": ...,
"final_loss": ...
}
CLI Usage¶
Classification benchmark:
python -m qml benchmark classification \
--models vqc qcnn quantum_kernel svm_classifier logistic_regression \
--seeds 123 456 789
Regression benchmark:
python -m qml benchmark regression \
--models vqr ridge_regression mlp_regressor \
--seeds 123 456
Default settings:
- samples: 200
- noise: 0.1
- test split: 0.25
- seed: 123
Saving Benchmark Results¶
Results can be saved to disk:
compare_classification_models(
seeds=[0, 1, 2],
save=True,
)
Saved files are placed in:
results/benchmarks/
Example output file:
classification_benchmark.json
Saved JSON includes:
- individual run records
- aggregated metrics
- dataset configuration
This allows reproducibility and later analysis.
Model Selection¶
Models are referenced by string identifiers.
Classification:
vqc
qcnn
quantum_kernel
logistic_regression
svm_classifier
mlp_classifier
Regression:
vqr
ridge_regression
mlp_regressor
Invalid model names raise an error.
Example:
compare_classification_models(
models=["vqc", "invalid_model"]
)
Multi-seed Evaluation¶
Variational quantum models depend on:
- random parameter initialisation
- optimiser stochasticity
- dataset sampling variability
Performance should therefore be evaluated across multiple seeds.
Aggregate statistics:
These values are computed for each metric.
Relationship to Other Modules¶
Benchmark utilities call the following workflows:
Classification:
qml.classifiers.run_vqcqml.qcnn.run_qcnnqml.kernel_methods.run_quantum_kernel_classifierqml.classical_baselines.run_logistic_classifierqml.classical_baselines.run_svm_classifierqml.classical_baselines.run_mlp_classifier
Regression:
qml.regression.run_vqrqml.classical_baselines.run_ridge_regressionqml.classical_baselines.run_mlp_regressor
Datasets are generated using shared utilities from:
qml.data
qml.datasets
ensuring consistent experimental conditions across models.
When to Use Benchmarks¶
Benchmarking is useful when:
- comparing quantum vs classical performance
- testing sensitivity to optimiser settings
- evaluating ansatz depth
- studying generalisation performance
- generating reproducible experiment summaries
Typical workflow:
- explore behaviour in notebooks
- run benchmark across seeds
- analyse aggregated metrics
- refine model configuration