Skip to content

epfl-ada/ada-2025-project-adav

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

101 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Immunology: Dynamics and Contagion in the NASDAQ Market

Abstract

Financial markets, much like biological populations, are susceptible to contagion. A localized distress in one asset can rapidly propagate through hidden dependency channels and contacts, leading to widespread systemic failure. Why does this matter? Traditional econometric models often fail to capture the dynamic and directional nature of shock propagation during extreme events. This project applies epidemiological frameworks to financial time series to understand not just that markets crash, but how the infection spreads.

We analyze the NASDAQ market across three major crises: the Dot-Com Bubble, the Subprime Mortgage Crisis, and the COVID-19 Crash. By building dynamic Granger-Causality networks and defining specific health states (Healthy vs. Sick) for every stock, we identify the "Patient Zeros" that trigger instability and the "Super-spreaders" that amplify it. Our ultimate goal is to validate a "Pandemic Potential Index" (PPI)—a novel metric designed to quantify the virulence of specific assets before a full-scale meltdown occurs. For each crisis, the same analysis is repeated at an industrial sector level.

Research Questions

Our project moves beyond simple correlation analysis to answer four specific questions about market systemic health:

  1. Can financial contagion be modeled as a biological epidemic?

    • Hypothesis: Assigning epidemiological states (Susceptible, Infected/Shocked, Recovered) to stocks based on return thresholds highlights propagation patterns invisible to standard time-series analysis.
  2. Who are the "Patient Zeros" and "Super-spreaders" of those historical crashes?

    • Goal: Identify the specific assets that initiated the cascade in 2000, 2008, and 2020. Are they always the largest cap stocks, or do peripheral assets trigger the fall?
  3. Does the network topology serve as a leading indicator of distress?

    • Goal: Analyze how the density and structure of Granger-Causality networks shift before and during a crash. We look for "densifications" of the network to predict a collapse in daily returns.
  4. Can we define and validate a "Pandemic Potential Index" (PPI)?

    • Goal: Construct a robust metric combining network centrality (influence) and transmission probability (severity) to quantify each stock's systemic risk contribution.

Data Description & Enrichment

We use a comprehensive dataset of typically available financial data, enriched with sector classifications to enable cross-sector analysis.

  • Primary Dataset: We process daily Open, High, Low, Close, and Volume data for 50+ major high-cap NASDAQ tickers covering the period from 1990 to present. The raw data is sourced from the Stock Market Dataset on Kaggle, from which we selectively kept only the stock data.

  • Sector Classification Enrichment: We assigned each stock to an industry sector using the Industry Classification Benchmark (ICB) standard. This enrichment enabled us to perform the analyses at the community level and study "cross-immunity" (e.g., how a tech crash propagates to the banking sector). Note that this dataset was generated for the purpose of this project and is not taken from an otherwise publicly available source.

  • Preprocessing: Data is cleaned for stock splits and dividends. We compute log-returns to ensure stationarity of our signals and apply Rolling Windows (e.g., 30-day lookback) to capture dynamic dependencies rather than static snapshots.

Methods

Our approach integrates rigorous financial econometrics with sophisticated network science. The analysis pipeline is modularized in our src/ directory for reproducibility.

1. Market Regime Segmentation

Instead of arbitrarily picking dates, we use Dynamic Programming to mathematically segment market history.

  • Algorithm: We minimize the Sum of Squared Errors (SSE) of the mean market return signal to find optimal "changepoints."

  • Result: This auto-classifies the timeline into regimes: Bull/Calm, Bull/Volatile, Bear/Calm, and Bear/Stress.

2. Causality Network Construction

We move beyond simple Pearson correlations (which imply symmetric relationships) to directional causality.

  • Granger Causality: For every pair of stocks $(X, Y)$, we test if past returns of $X$ statistically predict occurrence of $Y$'s returns better than $Y$'s own history alone.

  • Dynamic Matrix: This computation is repeated over sliding windows, resulting in a time-varying adjacency matrix $A_t$ where $A_{ij} = 1$ implies "Infection Pathway" from $i$ to $j$.

3. Epidemiological State Modeling

We implement a bespoke compartment model (SIR-like) tailored for finance:

  • State Definition:
    • Heathy: Daily return > $30^{th}$ percentile of window.
    • Sick (Infected): Daily return $\le 30^{th}$ percentile (Significant downside shock).
  • R0 Calculation: We compute an "Effective $R_0$" for every stock by measuring the volatility spillover to its susceptible neighbors in the network, weighted by edge strength and distance (max hops = 2).

4. Systemic Risk Metrics (PPI)

The Pandemic Potential Index (PPI) is calculated as a composite score: $$PPI_i = \text{Centrality}_i \times \text{TransmissionProb}_i \times \text{Severity}_i$$ This metric highlights stocks that are central in the causal network AND are currently experiencing severe distress.

Organization within the Team

  • Nazar, Luca, Benajmen: Worked on prototyping the logic for the analyses and constructing the pipelines. Developed and designed the data story website.

  • Samuel, Ahmad, Luca: Worked on refining the prototyped pipelines and re-writing the codebase to maximize reutilization across its components. Responsible for the structure and content of the top-level results.ipynb notebook.

Repository Structure

The directory structure of the project is organized as follows:

├── src/                        <- Source code modules
│   ├── data/                   <- Data loading and preprocessing logic
│   ├── models/                 <- Core modeling (Segmentation, Networks, Epidemiology)
│   ├── utils/                  <- Visualization and helper functions
│   ├── scripts/                <- Execution pipelines
│   └── configuration.py        <- Global settings
│
├── results.ipynb               <- Main analysis notebook (The Data Story)
├── pip_requirements.txt        <- Python dependencies
├── tests/                      <- Unit tests
└── README.md                   <- Project documentation

How to execute the code

To reproduce the analysis and results, follow these steps:

  1. Clone the repository

    git clone <project_link>
    cd <project_repo>
  2. Environment Setup It is recommended to use a virtual environment to manage dependencies:

    # Create a virtual environment
    python -m venv venv
    
    # Activate the virtual environment
    # On Windows:
    .\venv\Scripts\activate
    # On macOS/Linux:
    source venv/bin/activate
  3. Install Dependencies Install the required Python packages:

    pip install -r pip_requirements.txt
  4. Run the Analysis Open the main notebook to view the data story and results.

    ⚠️ Please take note that the notebook is somewhat lengthy and cells may take a while (in tens of minutes) to run. The notebook is also memory intensive, make sure you have enough RAM to run it. ⚠️

Data Story

The data story website can be found here.

About

ada-2025-project-adav created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors