Skip to content

cartilage-ftw/active-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 

Repository files navigation

An active learning scheme for optimizing protein sequences

We demonstrated how the wazy package (Yang et al. 2022, developed by members of the White lab at U. Rochester) can be trained on protein sequence-property prediction tasks. For training we ran coarse-grained simulations using HOOMD-blue 2.9.7 extended with azplugins. The simulations were run on the MOGON II computing cluster of JGU Mainz.

Here, we provide the code used for training and the results of the simulations (extracted quantities, e.g. $B_{22}$ or $\Delta G$ for protein sequences), along with the scripts used for generation the simulations and computing aforementioned quantities.

The code presented here was used in the study by Changiarath, Arya, Xenidis, Padeken, Stelzl 2024, under review for Faraday Discussions.

Dependencies

Our code builds on wazy, which performs the featurization using UniRep, and has its own construction for doing bayesian optimization using MLPs as a surrogate model. We use localCIDER for computing descriptors.

Apparently metapredict has a lot of dependencies and it'll download large libraries while installing.

pip install cython metapredict wazy localcider

Calculation of second virial coefficient

The second virial coefficient ($$B_{22} $$) is a key indicator of protein self-interactions in solution. It quantifies pairwise intermolecular forces between protein molecules, relating directly to the radial distribution function $$g(r)$$. Second virial coefficient is defined as:

$$ B_{22} = - 2 \pi \int_0^{\infty} \left[ \exp\left(\frac{-U(r)}{k_B T}\right) - 1 \right] r^2 dr, $$

where $$U(r)$$ is the potential energy at radius $$r$$, $$k_B$$ is the Boltzmann’s constant and $$T$$ the temperature.

We compute $$B_{22}$$ from the radial distribution function $$g(r)$$:

$$ B_{22} =- 2 \pi \int_0^{\infty} (g(r) - 1) r^2 dr, $$

where:

  • $$B_{22} $$ is the second virial coefficient,
  • $$g(r) $$ is the radial distribution function,
  • $$r$$ is the distance from a reference particle.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors