This release accompanies the article:
"Integrating co-expression network analysis and machine learning to reveal the regulatory landscape of GPD genes in Chlamydomonas reinhardtii under salinity stress"
It contains the complete computational workflow used in the study, structured into modular R scripts that cover:
-
Weighted Gene Co-expression Network Analysis (WGCNA)
Identification of co-expression modules under salt stress (200 mM NaCl), including hub gene detection and GPD gene connectivity analysis. -
Differential Expression Analysis (DESeq2)
Time-course contrasts across multiple stress durations (2h–72h), with annotated outputs and visualizations (volcano plots, heatmaps, Venn/UpSet diagrams). -
Gene Ontology Enrichment (topGO)
Functional annotation of WGCNA modules across BP, MF, and CC ontologies, with multiple testing correction and publication-ready plots. -
Machine Learning Validation (Random Forest)
Supervised classification of module assignments with performance metrics (AUC, ROC, confusion matrix), UMAP visualization, and misclassification analysis. -
Module Preservation Analysis
Permutation-based stability assessment of co-expression modules using Z-summary and medianRank statistics.
All scripts are designed for reproducibility and transparency, with session info files and fixed random seeds. Input data includes raw RNA-seq counts and protein annotations. This version reflects the final validated pipeline used in the publication and is tagged as v1.0.0.