Winnow General Probability Calibrator
Winnow recalibrates confidence scores and provides FDR control for de novo peptide sequencing (DNS) workflows.
This repository hosts a pretrained, general-purpose calibrator that maps raw InstaNovo model confidences and complementary features (mass error, retention time, beam features, fragment matching features) to well-calibrated probabilities.
- Intended inputs: spectrum input data and corresponding MS/MS PSM results produced by InstaNovo
- Outputs: rescored and calibrated per-PSM probabilities in
calibrated_confidencewith de novo FDR control.
What’s inside
model.safetensors: trained classifierconfig.json: classifier hyperparameter settings
How to use
Python
from pathlib import Path
from huggingface_hub import snapshot_download
from winnow.calibration.calibrator import ProbabilityCalibrator
from winnow.datasets.data_loaders import InstaNovoDatasetLoader
from winnow.scripts.main import filter_dataset
from winnow.fdr.nonparametric import NonParametricFDRControl
# 1) Download model files
general_model = Path("general_model")
snapshot_download(
repo_id="InstaDeepAI/winnow-general-model",
repo_type="model",
local_dir=general_model,
)
# 2) Load calibrator
calibrator = ProbabilityCalibrator.load(pretrained_model_name_or_path=general_model)
# 3) Load your dataset (InstaNovo-style config)
dataset = InstaNovoDatasetLoader().load(
data_path="path_to_spectrum_data.parquet",
predictions_path="path_to_instanovo_predictions.csv",
)
dataset = filter_dataset(dataset) # standard Winnow filtering
# 4) Predict calibrated confidences
calibrator.predict(dataset) # adds dataset.metadata["calibrated_confidence"]
# 5) Optional: FDR control on calibrated confidence
fdr = NonParametricFDRControl()
fdr.fit(dataset.metadata["calibrated_confidence"])
cutoff = fdr.get_confidence_cutoff(0.05) # 5% FDR cutoff
dataset.metadata["keep@5%"] = dataset.metadata["calibrated_confidence"] >= cutoff
CLI
# After `pip install winnow`
winnow predict \
data_loader=instanovo \
dataset.spectrum_path_or_directory=my_data.parquet \
dataset.predictions_path=my_preds.csv \
calibrator.pretrained_model_name_or_path=config_with_dataset_paths.yaml \
fdr_control.fdr_threshold=0.05 \
output_folder=outputs
Inputs and outputs
Required columns for calibration:
Spectrum data (parquet\ipc\mgf)
spectrum_id(string): unique spectrum identifierexperiment_name(string): MS run identifierretention_time(float): retention time (seconds)precursor_charge(float): charge of the precursor ion (from MS1)precursor_mz(float): mass-to-charge of the precursor ion (from MS1)mz_array(list[float]): mass-to-charge values of the MS2 spectrumintensity_array(list[float]): intensity values of the MS2 spectrum
Beam predictions (csv)
spectrum_id(string)predictions(string): top prediction, untokenised sequencepredictions_tokenised(string): comma‐separated tokens for the top predictionlog_probability(float): top prediction log probabilitytoken_log_probabilities(list[float]): per-token log-probabilities for the top predictionpredictions_beam_k(string): untokenised sequence for beam k (k≥0)log_probability_beam_k(float)token_log_probabilities_k(string/list-encoded)
Outputs:
metadata.csv: spectrum metadata and computed features. Contains everything except the prediction and FDR columns, i.e.:spectrum_id,experiment_name,precursor_mz,precursor_charge,retention_time, etc. (all pass-through spectrum columns) All computed feature columns, including intermediate results (mass_error_da,irt_error,ion_matches,margin, etc.)
preds_and_fdr_metrics.csv: predictions and FDR results. Always contains:spectrum_idpredictioncalibrated_confidence: calibrated probabilitypsm_fdrpsm_q_value- Optional:
psm_pep
Training data
- The general model was trained on a pooled, labelled set spanning multiple public datasets to encourage cross-dataset generalisation:
- HeLa single-shot (PXD044934)
- Candidatus Scalindua Brodae (PXD044934)
- Wound exudates (PXD025748)
- HepG2 (PXD019483)
- Immunopeptidomics (PXD006939)
- HeLa degradome (PXD044934)
- Snake venoms (PXD036161)
- Therapeutic nanobodies (PXD044934)
- This model uses fragment match features, iRT features, beam features, token score features and the mass error feature.
- Predictions were obtained using InstaNovo v1.2.0 with beam search set to 5 beams.
Citation
If you use winnow in your research, please cite our preprint: De novo peptide sequencing rescoring and FDR estimation with Winnow
@article{mabona2025novopeptidesequencingrescoring,
title = {De novo peptide sequencing rescoring and FDR estimation with Winnow},
author = {Amandla Mabona and Jemma Daniel and Henrik Servais Janssen Knudsen and
Rachel Catzel and Kevin Michael Eloff and Erwin M. Schoof and Nicolas
Lopez Carranza and Timothy P. Jenkins and Jeroen Van Goey and
Konstantinos Kalogeropoulos},
year = {2025},
eprint = {2509.24952},
archivePrefix = {arXiv},
primaryClass = {q-bio.QM},
url = {https://arxiv.org/abs/2509.24952},
}
If you use this pretrained calibrator, please cite:
@misc{instadeep_ltd_2025,
author = { InstaDeep Ltd },
title = { winnow-general-model (Revision ef3002daf2254369d04095731c76a022553ba63a) },
year = 2026,
url = { https://huggingface.co/InstaDeepAI/winnow-general-model },
doi = { 10.57967/hf/6611 },
publisher = { Hugging Face }
}
If you use the InstaNovo model to generate predictions, please also cite: InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments
@article{eloff_kalogeropoulos_2025_instanovo,
title = {InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale
proteomics experiments},
author = {Eloff, Kevin and Kalogeropoulos, Konstantinos and Mabona, Amandla and Morell,
Oliver and Catzel, Rachel and Rivera-de-Torre, Esperanza and Berg Jespersen,
Jakob and Williams, Wesley and van Beljouw, Sam P. B. and Skwark, Marcin J.
and Laustsen, Andreas Hougaard and Brouns, Stan J. J. and Ljungars,
Anne and Schoof, Erwin M. and Van Goey, Jeroen and auf dem Keller, Ulrich and
Beguir, Karim and Lopez Carranza, Nicolas and Jenkins, Timothy P.},
year = 2025,
month = {Mar},
day = 31,
journal = {Nature Machine Intelligence},
doi = {10.1038/s42256-025-01019-5},
issn = {2522-5839},
url = {https://doi.org/10.1038/s42256-025-01019-5}
}
Contact
For issues with this pretrained model or usage in Winnow, please open an issue on the Winnow GitHub: https://github.com/instadeepai/winnow
- Downloads last month
- 23