CAUTION: this package is under heavy development and is largely LLM generated. Breaking changes are expected and documentation might be out of date.
A Pythonic wrapper around pyOpenMS for mass spectrometry data analysis
openms-python provides an intuitive, Python-friendly interface to OpenMS, making mass spectrometry data analysis feel natural for Python developers and data scientists.
pyOpenMS is a Python binding for the powerful OpenMS C++ library. However, being a direct C++ binding, it doesn't always feel "Pythonic". This package wraps pyOpenMS to provide:
✅ Pythonic properties instead of verbose getters/setters
✅ Intuitive iteration with smart filtering
✅ Identification helpers for protein/peptide results with idXML IO
✅ pandas DataFrame integration for data analysis
✅ Method chaining for processing pipelines
✅ Type hints for better IDE support
✅ Clean, documented API with examples
import pyopenms as oms
exp = oms.MSExperiment()
oms.MzMLFile().load("data.mzML", exp)
n_spectra = exp.getNrSpectra()
for i in range(n_spectra):
spec = exp.getSpectrum(i)
if spec.getMSLevel() == 1:
rt = spec.getRT()
peaks = spec.get_peaks()
mz = peaks[0]
intensity = peaks[1]
print(f"RT: {spec.retention_time:.2f}s, Peaks: {spec.mz}, intens: {intensity}")from openms_python import MSExperiment
exp = Py_MSExperiment.from_file("data.mzML")
print(f"Loaded {len(exp)} spectra")
for spec in exp.ms1_spectra():
print(f"RT: {spec.retention_time:.2f}s, mz: {spec.mz}, intens: {spec.intensity}")
## or convert to pandas dataframe
df = spec.to_dataframe() # Get peaks as DataFramefrom openms_python import Py_MSExperiment
# Load experiment
exp = Py_MSExperiment.from_file('data.mzML')
# Get basic info
print(f"Total spectra: {len(exp)}")
print(f"RT range: {exp.rt_range}")
print(f"MS levels: {exp.ms_levels}")
# Print summary
exp.print_summary()New to OpenMS or don't have data handy? openms_python ships with a tiny
small.mzML example that is perfect for quick experiments.
from openms_python import Py_MSExperiment, get_example
example_path = get_example("small.mzML")
exp = Py_MSExperiment.from_file(example_path)
print(f"Loaded {len(exp)} spectra from the example file")
# Or load the raw bytes directly
example_bytes = get_example("small.mzML", load=True)Protein- and peptide-identification results often originate from search
engines that export idXML files. The Identifications helper reads these
files into convenient containers for downstream processing.
from openms_python import Identifications
# Load both protein and peptide identifications from idXML
ids = Identifications.from_idxml("search_results.idXML")
print(ids.summary())
# {'proteins': 12, 'peptides': 42, 'protein_hits': 12, 'peptide_hits': 42}
# Filter peptides by their top-hit score while keeping the protein context
high_conf = ids.filter_peptides_by_score(0.05)
# Look up peptides that match a particular protein accession
matches = high_conf.peptides_for_protein("P01234")
for pep in matches:
hit = pep.getHits()[0]
print(hit.getSequence(), hit.getScore())
# Persist the curated results back to disk
high_conf.to_idxml("curated_results.idXML")Multiple FeatureMap runs can be aligned and converted into a single
ConsensusMap directly from Python. The Py_ConsensusMap.align_and_link
helper performs three steps:
- Copies the incoming feature maps to avoid mutating your data
- Aligns the feature maps with your choice of OpenMS alignment algorithm
- Links the aligned runs using your choice of feature grouping algorithm
from openms_python import Py_FeatureMap, Py_ConsensusMap
feature_maps = [
Py_FeatureMap.from_dataframe(run_a_df),
Py_FeatureMap.from_dataframe(run_b_df),
]
consensus = Py_ConsensusMap.align_and_link(
feature_maps,
alignment_method="pose_clustering", # or "identification" / "identity"
alignment_params={"max_rt_shift": 15.0},
grouping_method="qt", # or "kd" / "labeled" / "unlabeled" (default: "qt")
grouping_params={"distance_RT:max_difference": 100.0},
)
print(f"Consensus contains {len(consensus)} features")The helper returns a fresh Py_ConsensusMap instance that can be exported,
converted to a pandas DataFrame, or iterated for downstream analysis.
Recent wrappers expose multiple entry points for inferring proteins directly from the Python API—either by starting from identification files, feature maps, or full consensus maps.
from openms_python import Identifications, Py_FeatureMap, Py_ConsensusMap
# 1) Run inference straight from an idXML file
ids = Identifications.from_idxml("search_results.idXML")
protein_summary = ids.infer_proteins(algorithm="bayesian")
print(protein_summary.summary())
# 2) Trigger inference on a feature map (assigned + unassigned peptides)
fmap = Py_FeatureMap().load("sample.featureXML")
proteins = fmap.infer_proteins(include_unassigned=True)
proteins.to_idxml("sample_proteins.idXML")
# 3) Operate directly on a consensus map
consensus = Py_ConsensusMap().load("merged.consensusXML")
consensus.infer_proteins(algorithm="basic")
# Optionally compute quantitative protein ratios in place
consensus.infer_protein_quantities(reference_map=1)
consensus.store("merged_with_proteins.consensusXML")All helpers share the same ergonomic parameter handling, accept native
pyopenms parameters (oms.Param) or plain dictionaries, and return
Identifications or the map instance itself for easy method chaining.
Looking for a larger end-to-end example? tests/test_idperformance.py ships with
the repository as a miniature-yet-realistic identification workflow that ties
many wrapper conveniences together. The test builds a tiny FASTA database,
simulates MS2 spectra, runs a Hyperscore-style search (complete with target/decoy
competition and q-value estimation), and even records how long the round-trip
through mzML takes. It demonstrates how concise—and how performant—a simple
search engine can be when built with the high-level helpers in
openms_python. Reuse the test as inspiration for bespoke pipelines or as a
regression harness when experimenting with search-related utilities.
Managing multi-sample, multi-fraction experiments? The Py_ExperimentalDesign
wrapper makes it straightforward to work with OpenMS experimental design files
that describe sample layouts, fractionation schemes, and labeling strategies.
tests/test_py_experimentaldesign.py provides comprehensive examples of loading
and querying experimental designs, including support for fractionated workflows,
label-free and labeled quantitation setups, and integration with feature maps,
consensus maps, and identification results. The wrapper exposes Pythonic
properties for quick access to sample counts, fraction information, and design
summaries—perfect for building sample-aware quantitation pipelines or validating
experimental metadata before analysis.
from openms_python import Py_ExperimentalDesign
import pandas as pd
# Load an experimental design from a TSV file
design = Py_ExperimentalDesign.from_file("design.tsv")
# Quick access to design properties
print(f"Samples: {design.n_samples}")
print(f"MS files: {design.n_ms_files}")
print(f"Fractionated: {design.is_fractionated}")
# Get a summary
design.print_summary()
# Convert to pandas DataFrame for analysis
df = design.to_dataframe()
# Create from a pandas DataFrame
df = pd.DataFrame({
'Fraction_Group': [1, 1, 2, 2],
'Fraction': [1, 2, 1, 2],
'Spectra_Filepath': ['f1.mzML', 'f2.mzML', 'f3.mzML', 'f4.mzML'],
'Label': [1, 1, 1, 1],
'Sample': [1, 1, 2, 2]
})
design = Py_ExperimentalDesign.from_dataframe(df)
# Store to file
design.store("output_design.tsv")
# Create from existing OpenMS objects
from openms_python import Py_ConsensusMap
consensus = Py_ConsensusMap.from_file("results.consensusXML")
design = Py_ExperimentalDesign.from_consensus_map(consensus)All sequence-like wrappers (feature maps, consensus maps, identification containers,
and experiments) support Python's iteration protocol. Metadata-aware wrappers such
as :class:Py_Feature and :class:Py_MSSpectrum expose their meta values like a
regular mapping, so you can loop over keys or call len().
from openms_python import Py_Feature, Py_FeatureMap, Identifications
# Assume ``feature_df`` is a pandas DataFrame with feature columns
feature_df = ...
fmap = Py_FeatureMap.from_dataframe(feature_df)
for feature in fmap:
print("Feature UID", feature.getUniqueId())
ids = Identifications.from_idxml("results.idXML")
for peptide in ids: # equivalent to iterating over ids.peptide_identifications
print(peptide.getIdentifier())
feature = Py_Feature()
feature["label"] = "sample_a"
for key in feature:
print(key, feature[key])The Py_AASequence wrapper provides a Pythonic interface to amino acid sequences with support for common operations like sequence reversal and shuffling for decoy generation. All operations delegate to pyOpenMS functionality to minimize reimplementation.
from openms_python import Py_AASequence
# Create a sequence from string
seq = Py_AASequence.from_string("PEPTIDERK")
# Access properties
print(f"Sequence: {seq.sequence}") # PEPTIDERK
print(f"Length: {len(seq)}") # 9
print(f"Mono weight: {seq.mono_weight:.2f} Da") # 1083.56 Da
print(f"Formula: {seq.formula}") # C46H77N13O17
# Iterate over amino acids
for aa in seq:
print(aa) # P, E, P, T, I, D, E, R, K
# Generate decoy sequences
reversed_seq = seq.reverse()
print(reversed_seq.sequence) # KREDITPEP
# Reverse with enzyme constraint (preserves cleavage sites)
reversed_enzyme = seq.reverse_with_enzyme("Trypsin")
print(reversed_enzyme.sequence) # EDITPEPRK
# Shuffle with reproducible seed
shuffled = seq.shuffle(enzyme="Trypsin", seed=42)
print(shuffled.sequence) # IPEDTEPRK (same with seed=42)
# Calculate m/z for different charge states
mz1 = seq.get_mz(1) # 1084.56
mz2 = seq.get_mz(2) # 542.79
mz3 = seq.get_mz(3) # 362.19
# Query sequence content
has_tide = seq.has_substring("TIDE") # True
starts_pep = seq.has_prefix("PEP") # True
ends_rk = seq.has_suffix("RK") # True
# Access individual residues
first_aa = seq[0] # "P"
# Work with modified sequences
mod_seq = Py_AASequence.from_string("PEPTIDEM(Oxidation)K")
print(f"Is modified: {mod_seq.is_modified}") # True
print(f"Unmodified: {mod_seq.unmodified_sequence}") # PEPTIDEMKProperties:
sequence: Full sequence string with modificationsunmodified_sequence: Sequence without modificationsmono_weight: Monoisotopic weight in Daaverage_weight: Average weight in Daformula: Molecular formulais_modified: Whether sequence has modificationshas_n_terminal_modification: N-terminal modification statushas_c_terminal_modification: C-terminal modification statusnative: Access to underlying pyOpenMS AASequence
Methods:
from_string(sequence_str): Create from string (class method)reverse(): Reverse entire sequencereverse_with_enzyme(enzyme): Reverse peptides between cleavage sitesshuffle(enzyme, max_attempts, seed): Shuffle with enzyme constraintsget_mz(charge): Calculate m/z for charge statehas_substring(substring): Check for substringhas_prefix(prefix): Check for prefixhas_suffix(suffix): Check for suffix
# Access individual spectra
spec = exp[0]
# Access multiple spectra with slicing
first_10 = exp[0:10] # First 10 spectra
last_5 = exp[-5:] # Last 5 spectra
every_other = exp[::2] # Every other spectrum
ms1_only = exp[1:4] # Spectra 2-4 (0 indexing)
print(f"First spectrum: {spec}")
print(f"First 10 spectra: {len(first_10)} spectra")
print(f"Last 5 spectra: {len(last_5)} spectra")
# Pythonic properties
print(f"Retention time: {spec.retention_time} seconds")
print(f"MS level: {spec.ms_level}")
print(f"Number of peaks: {len(spec)}")
print(f"Total ion current: {spec.total_ion_current}")
# Boolean helpers
if spec.is_ms1:
print("This is an MS1 spectrum")
# Get peaks as NumPy arrays
mz, intensity = spec.peaks
# Or as a DataFrame
peaks_df = spec.to_dataframe()
print(peaks_df.head())Chromatograms (e.g., extracted ion chromatograms, total ion chromatograms) are now fully supported with a Pythonic interface similar to spectra.
from openms_python import Py_MSExperiment, Py_MSChromatogram
import pandas as pd
# Load experiment with chromatograms
exp = Py_MSExperiment.from_file('data.mzML')
# Check chromatogram count
print(f"Chromatograms: {exp.chromatogram_count}")
# Access individual chromatograms
if exp.chromatogram_count > 0:
chrom = exp.get_chromatogram(0)
print(f"MZ: {chrom.mz:.4f}")
print(f"Name: {chrom.name}")
print(f"Data points: {len(chrom)}")
print(f"RT range: {chrom.rt_range}")
print(f"TIC: {chrom.total_ion_current:.2e}")
# Iterate over all chromatograms
for chrom in exp.chromatograms():
print(f"Chromatogram: MZ={chrom.mz:.4f}, Points={len(chrom)}")
# Create chromatogram from DataFrame
df = pd.DataFrame({
'rt': [10.0, 20.0, 30.0, 40.0],
'intensity': [1000.0, 5000.0, 3000.0, 500.0]
})
chrom = Py_MSChromatogram.from_dataframe(
df,
mz=445.12,
name="XIC m/z 445.12",
native_id="chromatogram=1"
)
# Add to experiment
exp.add_chromatogram(chrom)
# Chromatogram properties
print(f"MZ: {chrom.mz:.4f}")
print(f"Max intensity: {chrom.max_intensity}")
print(f"RT at max: {chrom.rt[chrom.intensity.argmax()]:.2f}")
# Get data as arrays
rt, intensity = chrom.data
# Or individually
rt_values = chrom.rt
intensity_values = chrom.intensity
# Convert to DataFrame
chrom_df = chrom.to_dataframe()
# Filter chromatogram by RT
filtered = chrom.filter_by_rt(min_rt=15.0, max_rt=35.0)
# Normalize intensities
normalized = chrom.normalize_intensity(max_value=100.0)
normalized_tic = chrom.normalize_to_tic()
# Metadata access
chrom["sample_id"] = "Sample_A"
chrom["replicate"] = 1
print(chrom.get("sample_id"))
### Ion Mobility Support
`openms-python` provides comprehensive support for ion mobility data through float data arrays and mobilograms.
#### Float Data Arrays
Spectra can have additional data arrays (e.g., ion mobility values) associated with each peak:
```python
from openms_python import Py_MSSpectrum
import pandas as pd
import numpy as np
# Create a spectrum with ion mobility data
df = pd.DataFrame({
'mz': [100.0, 200.0, 300.0],
'intensity': [50.0, 100.0, 75.0],
'ion_mobility': [1.5, 2.3, 3.1]
})
spec = Py_MSSpectrum.from_dataframe(df, retention_time=60.5, ms_level=1)
# Access ion mobility values
print(spec.ion_mobility) # array([1.5, 2.3, 3.1])
# Set ion mobility values
spec.ion_mobility = np.array([1.6, 2.4, 3.2])
# Convert to DataFrame with float arrays
df = spec.to_dataframe(include_float_arrays=True)
print(df)
# mz intensity ion_mobility
# 0 100.0 50.0 1.6
# 1 200.0 100.0 2.4
# 2 300.0 75.0 3.2Mobilograms represent the ion mobility dimension, showing intensity vs. drift time for a specific m/z.
Note: OpenMS C++ has a native Mobilogram class that may not yet be wrapped in pyopenms. This wrapper uses MSChromatogram as the underlying representation for mobilogram data.
from openms_python import Py_Mobilogram
import numpy as np
# Create a mobilogram from arrays
drift_times = np.array([1.0, 1.5, 2.0, 2.5, 3.0])
intensities = np.array([100.0, 150.0, 200.0, 180.0, 120.0])
mob = Py_Mobilogram.from_arrays(drift_times, intensities, mz=500.0)
print(f"m/z: {mob.mz}")
print(f"Points: {len(mob)}")
print(f"Base peak drift time: {mob.base_peak_drift_time}")
# Convert to DataFrame
df = mob.to_dataframe()
print(df.head())
# drift_time intensity mz
# 0 1.0 100.0 500.0
# 1 1.5 150.0 500.0
# 2 2.0 200.0 500.0
# Create from DataFrame
df = pd.DataFrame({
'drift_time': [1.0, 2.0, 3.0],
'intensity': [50.0, 100.0, 75.0]
})
mob = Py_Mobilogram.from_dataframe(df, mz=600.0)openms_python now exposes opinionated utilities that combine the primitive
wrappers into streaming pipelines and end-to-end quantitation helpers:
from openms_python import (
Py_MSExperiment,
Identifications,
ProteinStream,
map_identifications_to_features,
align_feature_maps,
link_features,
export_quant_table,
)
# 1) Detect features in an experiment
experiment = Py_MSExperiment.from_file("run.mzML")
feature_map = experiment.detect_features()
# 2) Map identifications and filter them by FDR
identifications = Identifications.from_idxml("search_results.idXML")
filtered = identifications.filter_by_fdr(threshold=0.01)
annotated = map_identifications_to_features(feature_map, filtered)
# 3) Align multiple maps and link them into a consensus representation
aligned = align_feature_maps([annotated, second_run])
consensus = link_features(
aligned,
grouping_method="qt", # or "kd" / "labeled" / "unlabeled"
params={"distance_RT:max_difference": 100.0}
)
# 4) Export a tidy quantitation table (per-sample intensities)
quant_df = export_quant_table(consensus)
# Bonus: streaming FASTA digestion & theoretical spectra generation
for record in ProteinStream.from_fasta("proteins.fasta").digest().theoretical_spectra():
print(record.protein.identifier, record.peptide.toString(), len(record.spectrum))import pyopenms as oms
picker = oms.PeakPickerHiRes()
params = picker.getDefaults()
params.setValue("signal_to_noise", 3.0)
picker.setParameters(params)
centroided = oms.MSExperiment(exp)
picker.pickExperiment(exp, centroided, True)from openms_python import Py_MSExperiment
# Choose from multiple peak picking algorithms
centroided = exp.pick_peaks(method="hires", params={"signal_to_noise": 3.0})
# Available methods: "hires" (default), "cwt", "iterative"
# or modify in-place
exp.pick_peaks(inplace=True)# Apply a GaussFilter with a custom width
smoothed = exp.smooth_gaussian(gaussian_width=0.1)
# Smooth only MS2 spectra in-place using Savitzky-Golay
exp.smooth_savitzky_golay(ms_levels=2, inplace=True, frame_length=7)# Iterate over MS1 spectra only
for spec in exp.ms1_spectra():
print(f"MS1 at RT={spec.retention_time:.2f}s")
# Iterate over MS2 spectra only
for spec in exp.ms2_spectra():
print(f"MS2: precursor m/z = {spec.precursor_mz:.4f}")
# Filter by retention time
for spec in exp.rt_filter[100:200]:
print(f"Spectrum at RT={spec.retention_time:.2f}s")# Convert entire experiment to DataFrame
df = exp.to_dataframe(include_peaks=True)
print(df.head())
# Spectrum-level DataFrame
df_spectra = exp.to_dataframe(include_peaks=False)
# MS2 peaks only
df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2)# Filter and process in a pipeline
filtered_exp = (exp
.filter_by_ms_level(1)
.filter_by_rt(100, 500)
.filter_by_mz(400, 500)
.filter_top_n_peaks(100))
# OR
filtered_exp = (exp
.filter_by_ms_level(1)
.rt_filter[100:500]
.mz_filter[400:500]
.filter_top_n_peaks(100))
print(f"After filtering: {len(filtered_exp)} spectra")# Filter peaks by m/z
filtered_spec = spec.filter_by_mz(100, 500)
# OR
filtered_spec = spec.mz_filter[100:500]
# Get top N peaks
top_10 = spec.top_n_peaks(10)# Save to mzML
exp.to_mzml('output.mzML')
# Or use convenience function
from openms_python import write_mzml
write_mzml(exp, 'output.mzML')All high-level containers expose load/store helpers that infer the correct
pyOpenMS reader from the file extension (including .gz).
import pyopenms as oms
exp = oms.MSExperiment()
oms.MzMLFile().load("input.mzML", exp)
oms.MzMLFile().store("output.mzML", exp)
fmap = oms.FeatureMap()
oms.FeatureXMLFile().load("input.featureXML", fmap)
oms.FeatureXMLFile().store("output.featureXML", fmap)from openms_python import Py_MSExperiment, Py_FeatureMap, Py_ConsensusMap
exp = Py_MSExperiment().load("input.mzML")
exp.store("output.mzML")
feature_map = Py_FeatureMap().load("input.featureXML")
feature_map.store("output.featureXML")
cons_map = Py_ConsensusMap().load("input.consensusXML")
cons_map.store("output.consensusXML")from openms_python.io import MzMLReader, MzMLWriter
# Reading with context manager
with MzMLReader('data.mzML') as exp:
print(f"Loaded {len(exp)} spectra")
for spec in exp.ms1_spectra():
print(spec)
# Writing with context manager
with MzMLWriter('output.mzML') as writer:
writer.write(exp)import pyopenms as oms
class SpectrumCounter(oms.MSExperimentConsumer):
def __init__(self):
super().__init__()
self.ms2 = 0
def consumeSpectrum(self, spec):
if spec.getMSLevel() == 2:
self.ms2 += 1
consumer = SpectrumCounter()
oms.MzMLFile().transform("big.mzML", consumer)
print(f"Processed {consumer.ms2} MS2 spectra")from openms_python import stream_mzml
with stream_mzml("big.mzML") as spectra:
ms2 = sum(1 for spec in spectra if spec.ms_level == 2)
print(f"Processed {ms2} MS2 spectra")All wrappers behave like mutable Python sequences.
# Append and extend
exp.append(new_spec).extend(other_exp)
# Remove by index or slice
exp.remove(-1)
del exp[::2]feature_map.append(feature)
feature_map.extend(iter_of_features)
feature_map.remove(0) # delete by index
cons_map.append(cons_feature)
del cons_map[-3:]# DataFrame round-trip
df = feature_map.to_dataframe()
df["mz"] += 0.01 # manipulate with pandas
feature_map = Py_FeatureMap.from_dataframe(df)
cons_df = cons_map.get_df()
cons_df["quality"] = 0.95
cons_map = Py_ConsensusMap.from_df(cons_df)
peaks_df = exp.get_df()
peaks_df["intensity"] *= 1.1
exp = Py_MSExperiment.from_df(peaks_df)Behind the scenes the wrappers copy the retained entries back into the underlying pyOpenMS container, preserving meta data while exposing the expected Python semantics. By contrast, pyOpenMS requires manually creating a new container, copying every element except the ones you wish to remove, and reassigning the result.
Any pyOpenMS type that derives from MetaInfoInterface (features, spectra,
consensus features, etc.) now behaves like a standard Python mapping for its
meta annotations:
import pyopenms as oms
from openms_python import Py_Feature, Py_MSSpectrum
feature = Py_Feature()
feature["label"] = "Sample1"
feature.update(condition="control", replicate="R1")
assert feature["label"] == "Sample1"
assert feature.get("missing", "n/a") == "n/a"
spectrum = Py_MSSpectrum(oms.MSSpectrum())
spectrum["IonInjectTime"] = 12.3
assert "IonInjectTime" in spectrum
spectrum.pop("IonInjectTime")Internally this syntax delegates to the familiar setMetaValue,
getMetaValue, and removeMetaValue calls on the wrapped pyOpenMS object, so
no information is lost compared to the C++ interface.
import pandas as pd
from openms_python import Py_MSSpectrum
# From DataFrame
df = pd.DataFrame({
'mz': [100.0, 200.0, 300.0],
'intensity': [50.0, 100.0, 75.0]
})
spec = Spectrum.from_dataframe(
df,
retention_time=120.5,
ms_level=1,
native_id='spectrum=1'
)# Create experiment from grouped data
df = pd.DataFrame({
'spectrum_id': [0, 0, 1, 1, 2, 2],
'mz': [100, 200, 150, 250, 120, 220],
'intensity': [50, 100, 60, 110, 55, 105],
'retention_time': [10.0, 10.0, 20.0, 20.0, 30.0, 30.0],
'ms_level': [1, 1, 1, 1, 1, 1]
})
exp = Py_MSExperiment.from_dataframe(df)from openms_python import Py_MSExperiment
import pandas as pd
import matplotlib.pyplot as plt
# Load data
exp = Py_MSExperiment.from_file('data.mzML')
# Get MS2 spectra as DataFrame
df_ms2 = exp.to_dataframe(include_peaks=True, ms_level=2)
# Analyze precursor distribution
precursor_stats = df_ms2.groupby('precursor_mz').agg({
'intensity': 'sum',
'spectrum_index': 'count'
}).rename(columns={'spectrum_index': 'n_spectra'})
print(precursor_stats.head())
# Plot TIC over time
df_spectra = exp.to_dataframe(include_peaks=False)
plt.figure(figsize=(10, 4))
plt.plot(df_spectra['retention_time'], df_spectra['total_ion_current'])
plt.xlabel('Retention Time (s)')
plt.ylabel('Total Ion Current')
plt.title('TIC over time')
plt.show()Properties:
n_spectra: Number of spectranr_chromatograms: Number of chromatogramschromatogram_count: Alias for nr_chromatogramsrt_range: Tuple of (min_rt, max_rt)ms_levels: Set of MS levels present
Methods:
from_file(filepath): Load from mzML file (class method)from_dataframe(df, group_by): Create from DataFrame (class method)to_file(filepath): Save to mzML fileto_dataframe(include_peaks, ms_level): Convert to DataFramems1_spectra(): Iterator over MS1 spectrams2_spectra(): Iterator over MS2 spectraspectra_by_level(level): Iterator over specific MS levelspectra_in_rt_range(min_rt, max_rt): Iterator over RT rangeget_chromatogram(index): Get chromatogram by indexchromatograms(): Iterator over all chromatogramsadd_chromatogram(chromatogram): Add a chromatogram to the experimentfilter_by_ms_level(level): Filter by MS levelfilter_by_rt(min_rt, max_rt): Filter by RT rangefilter_top_n_peaks(n): Keep top N peaks per spectrumsummary(): Get summary statisticsprint_summary(): Print formatted summary
Properties:
mz: M/z value for the chromatogramname: Chromatogram namenative_id: Native chromatogram IDchromatogram_type: Type of chromatogramrt_range: Tuple of (min_rt, max_rt)min_rt: Minimum retention timemax_rt: Maximum retention timemin_intensity: Minimum intensitymax_intensity: Maximum intensitytotal_ion_current: Sum of all intensitiesdata: Tuple of (rt_array, intensity_array)rt: RT values as NumPy arrayintensity: Intensity values as NumPy array
Methods:
from_numpy(rt, intensity, **metadata): Create from NumPy arrays (class method)from_dataframe(df, **metadata): Create from DataFrame (class method)to_numpy(): Convert to NumPy arraysto_dataframe(): Convert to DataFramefilter_by_rt(min_rt, max_rt): Filter data points by RT rangefilter_by_intensity(min_intensity): Filter by minimum intensitynormalize_intensity(max_value): Normalize intensities to max valuenormalize_to_tic(): Normalize so intensities sum to 1.0
Properties:
sequence: Full sequence string with modificationsunmodified_sequence: Sequence without modificationsmono_weight: Monoisotopic weight in Daaverage_weight: Average weight in Daformula: Molecular formulais_modified: Whether sequence has modificationshas_n_terminal_modification: N-terminal modification statushas_c_terminal_modification: C-terminal modification statusnative: Access to underlying pyOpenMS AASequence
Methods:
from_string(sequence_str): Create from string (class method)reverse(): Reverse entire sequencereverse_with_enzyme(enzyme): Reverse peptides between enzymatic cleavage sitesshuffle(enzyme, max_attempts, seed): Shuffle peptides with enzyme constraintsget_mz(charge): Calculate m/z for given charge statehas_substring(substring): Check if sequence contains substringhas_prefix(prefix): Check if sequence starts with prefixhas_suffix(suffix): Check if sequence ends with suffix
Properties:
retention_time: RT in secondsms_level: MS level (1, 2, etc.)is_ms1,is_ms2: Boolean helpersprecursor_mz: Precursor m/z (MS2+)precursor_charge: Precursor charge (MS2+)native_id: Native spectrum IDtotal_ion_current: Sum of intensitiesbase_peak_mz: m/z of most intense peakbase_peak_intensity: Intensity of base peakpeaks: Tuple of (mz_array, intensity_array)float_data_arrays: List of FloatDataArray objectsion_mobility: Ion mobility values as NumPy arraydrift_time: Spectrum-level drift time value
Methods:
from_dataframe(df, **metadata): Create from DataFrame (class method)to_dataframe(include_float_arrays=True): Convert to DataFramefilter_by_mz(min_mz, max_mz): Filter peaks by m/zfilter_by_intensity(min_intensity): Filter peaks by intensitytop_n_peaks(n): Keep top N peaksnormalize_intensity(max_value): Normalize intensities
Properties:
name: Name of the mobilogrammz: m/z value this mobilogram representsdrift_time: Drift time values as NumPy arrayintensity: Intensity values as NumPy arraypeaks: Tuple of (drift_time_array, intensity_array)total_ion_current: Sum of intensitiesbase_peak_drift_time: Drift time of most intense pointbase_peak_intensity: Intensity of base peak
Methods:
from_arrays(drift_time, intensity, mz=None, name=None): Create from arrays (class method)from_dataframe(df, **metadata): Create from DataFrame (class method)to_dataframe(): Convert to DataFrame
Identifications combines protein and peptide search results and keeps the two collections synchronized.
Constructors / IO:
Identifications.from_idxml(path): Load protein & peptide IDs from an idXML fileIdentifications.store(path)/to_idxml(path): Save to idXML
Convenience helpers:
Identifications.summary(): Quick counts of IDs and hitsIdentifications.find_protein(...),find_peptide(...): Search by identifierIdentifications.find_protein_by_accession(...): Look up proteins by accessionIdentifications.find_peptide_by_sequence(...): Case-insensitive peptide searchIdentifications.filter_peptides_by_score(threshold): Return a copy with filtered peptidesIdentifications.peptides_for_protein(accession): Retrieve peptides linked to a protein
Underlying containers (ProteinIdentifications, PeptideIdentifications) behave
like Python sequences: they support slicing, appending, iterating, and provide
convenience methods such as find_by_identifier, find_by_accession, and
filter_by_score for quickly triaging search hits.
git clone https://github.com/openms/openms-python.git
cd openms-python
pip install -e ".[dev]"| Feature | pyOpenMS | openms-python |
|---|---|---|
| Get spectrum count | exp.getNrSpectra() |
len(exp) |
| Get chromatogram count | exp.getNrChromatograms() |
exp.chromatogram_count |
| Get retention time | spec.getRT() |
spec.retention_time |
| Get chromatogram m/z | chrom.getMZ() |
chrom.mz |
| Check MS1 | spec.getMSLevel() == 1 |
spec.is_ms1 |
| Load file | MzMLFile().load(path, exp) |
exp = MSExperiment.from_file(path) |
| Iterate MS1 | Manual loop + level check | for spec in exp.ms1_spectra(): |
| Iterate chromatograms | Manual loop + range check | for chrom in exp.chromatograms(): |
| Peak data | peaks = spec.get_peaks(); mz = peaks[0] |
mz, intensity = spec.peaks |
| DataFrame | Not available | df = exp.to_dataframe() |
| Create sequence | oms.AASequence.fromString("PEP") |
Py_AASequence.from_string("PEP") |
| Get sequence weight | seq.getMonoWeight() |
seq.mono_weight |
| Reverse sequence | DecoyGenerator().reverseProtein(seq) |
seq.reverse() |
| Iterate residues | Manual loop with getResidue(i) |
for aa in seq: |
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.
If you use openms-python in your research, please cite:
Röst HL, Sachsenberg T, Aiche S, et al. OpenMS: a flexible open-source software platform
for mass spectrometry data analysis. Nat Methods. 2016;13(9):741-748.
- Documentation: https://openms-python.readthedocs.io
- Issues: GitHub Issues
- Discussions: GitHub Discussions