Condition-specific regulations
- Getting Started
Table of contents generated with markdown-toc Table of contents generated with markdown-toc
- python = 3.6
- numpy == 1.16.2
- scipy == 1.1.0
- pandas == 0.21.1
- joblib >= 0.11
- rpy2==2.8.6
- networkx >= 2
- sklearn >= 0.19.1
- intervaltree == 2.1.0
- ChIPSeeker == 1.16.1
- CoReg == 1.0.1
- gglasso == 1.4
- RRF == 1.9
- R >= 3.5.1
We provide four ways to install ConSReg and its dependencies:
- install ConSReg by Anaconda (with environment.yml)
- install ConSReg by Anaconda (manual installation)
- singularity image
- manual installation for all dependecies
Users only need to choose one of them that works. We would suggest to starting with
1or3, as these two options are very easy and straightforward.
Since ConSReg is dependent on both Python and R packages, we recommend installing ConSReg by Anaconda to easily set up the running environment. You may retrive Anaconda from here and install the version corresponding to your OS.
Once Anaconda is installed in your os, create a new conda environment using the environment.yml file in this repository:
conda env create -f environment.ymlAlternatively, you can type in
conda env create -f https://raw.githubusercontent.com/LiLabAtVT/ConSReg/master/environment.ymlThis will create a new conda environment named consreg which contains ConSReg package and all its dependecies. You can then type in conda activate consreg to activate this environment or conda deactivate to deactivate this environment. For more information, please refer to official documentation of Ananconda.
Alternatively, after Anaconda is installed in your OS (see 1.3.1), run the following commands to create an new environment and install ConSReg and all its dependencies into the new environment:
conda create -y -n consreg python=3.6 # The new environment name is 'consreg'. You may use other name instead.
conda activate consreg
conda install -y -c bioconda --no-channel-priority bioconductor-chipseeker
conda install -y --no-channel-priority r-base r-essentials
conda install -y --no-channel-priority -c conda-forge r-gglasso r-rrf r-devtools
pip install ConSRegThen ConSReg environment can be activated by conda activate consreg and disabled by conda deactivate
Singularity is a container system which creasts lightweight container that hosts all system dependencies and environment for a given software package. Users may simply pull container image from the cloud and then run the program inside the container without having to installing it in their own machines. You may install Singularity following the instructions here.
To install ConSReg using Singularity, you may simply pull our prebuilt singularity image for ConSReg:
singularity pull -U library://alexsong0374/consreg/consreg_ubuntu20.04This will create an image file called "consreg_ubuntu20.04_latest.sif" locally. To run python environment with ConSReg installed, you may run the local container by:
singularity run consreg_ubuntu20.04_latest.sif python3Alternatively, you may run jupyter notebook inside the container:
singularity run consreg_ubuntu20.04_latest.sif jupyter notebookAdditonally, we provide the singularity definination file (consreg_singularity.def) in this repository for rebuilding the container for ConSReg. You may rebuild the container by yourself and add other packages you want.
If all of the above steps fail, you may manually install all dependencies by following the steps below.
If R is not already installed, you may follow these steps to build R from source code. Otherwise, you may skip this section and start from 1.2.2
First, disable any conda environment, if there is an active one.
conda deactivateDownload R source code from CRAN (https://cran.r-project.org/). You may use any version you like. It is recommended to use R version > 3.0.0. This ensures that rpy2 works correctly with R.
# Download R 3.6.1
wget https://cran.r-project.org/src/base/R-3/R-3.6.1.tar.gzDecompress the downloaded file
tar -zvxf R-3.6.1In the decompressed folder, configure R by:
./configure prefix=path_to_install_R --enable-R-shlib--prefix= specifies a writeable directory to install R into. --enable-R-shlib flag was added to build R shared libraries.
In the decompressed folder, compile R
makeInstall R into the specified directory:
make installAdd a line to ~/.bashrc to tell the OS where to look for R
export PATH=path_to_R_bin_directory:$PATHAdd the following line to ~/.bashrc. This is for telling rpy2 where to look for dynamic libraries.
export LD_LIBRARY_PATH=/home/alexsong/R/3.6.1/lib64/R/lib:$LD_LIBRARY_PATHApply the changes to environment variables PATH and LD_LIBRARY_PATH:
source ~/.bashrcConSReg requires several R packages: ChIPseeker, CoReg, gglasso and RRF.
It is recommended to deactivate any conda environment when installing R packages, as it may add the environment-specific path which may fail the installation. If any conda environment is active, you may deactivate it by:
conda deactivateTo install ChIPSeeker from bioconductor, type the following commands in R (for R 3.6 or higher version):
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ChIPseeker")For older version of R, type the following commands in R:
source("https://bioconductor.org/biocLite.R")
biocLite("ChIPseeker")Please refer to the instructions described here for more details.
To install CoReg pakcage from GitHub, type the following commands in R environment:
install.packages("devtools")
library(devtools)
install_github("LiLabAtVT/CoReg")Please refer to the GitHub page of CoReg project for more details:
link
To install gglasso package from CRAN, type the following commands in R environment:
install.pacakges("gglasso")Please refer to the link here for more details.
To install RRF package from CRAN, type the following commands in R environment:
install.pacakges("RRF")Please refer to the link here for more details.
ConSReg can be installed by pip:
pip install ConSRegSometime rpy2 may throw out error message when imported in Python. This problem may arise because rpy2 was built with the R version that is different from the one it is linked to when imported in Python. To fix this, you may remove rpy2 package then reinstall it with 'no-cache-dir' flag:
pip install ConSReg --no-cache-dirAlternatively, you may want to install ConSReg in development mode to be able to edit the package by yourself. To do so, simply git clone this repository and then under the directory that contains setup.py, type in:
pip install -e .Sample datasets can be found in data folder.
ConSReg can take three types of genomic data as inputs:
-
Open chromatin position information from ATAC-seq data, represented as a bed file, which should include three columns: Column1 represents chromosome name (e.g, chr1, chr2); Column2 represents start position of an open chromatin region; Column3 represents end position of an open chromatin region (See also the sample data files under
data/atac_seq_all_peaksin this repository) -
TF binding location information from DAP-seq/ChIP-seq data, represented as narrowPeak files, which should contain the same types of columns stated for ATAC-seq data (See also the sample data files under
data/dap_seq_all_peaksin this repository). Note that the binding information for different TFs should be in separate files and each file is named by the name of the corresponding TF. For example, file “AT1G01060.narrowPeak” contains only the TF binding locations for the TF “AT1G01060”. -
Differentially expressed gene (DEG) information from RNA-seq/microarray data, represented as comma-separated values (CSV) table files, which should contain four columns: The first column represents gene name; a column named “baseMean” that represents mean expression for the gene; a column named “log2FoldChange” that represents log2-scaled fold change; a column named “padj” that represents the adjusted p-values from statistical test for differential expressions. This type of DEG information usually could be generated from DESeq2 package (See DESeq2 page for more information: https://bioconductor.org/packages/release/bioc/html/DESeq2.html). ConSReg can take multiple DEG tables as inputs and each table corresponds to a DEG analysis from one experiment.
We provide code for analyzing the sample datasets in two jupyter notebooks located in the root folder of this project: bulk_analysis.ipynb (for bulk RNA-seq data) and single_cell_analysis.ipynb (for single cell RNA-seq data).
Please cite the followint paper if you use ConSReg in your research:
Qi Song, Jiyoung Lee, Shamima Akter, Ruth Grene, Song Li. "Prediction of condition-specific regulatory genes using machine learning." Nucleic acids research 48.11 (2020): e62-e62.