py-econometrics · Dpananos · Nov 11, 2025 · Nov 11, 2025 · Nov 11, 2025 · Nov 11, 2025
diff --git a/.cursor/plans/glm-fixe-a86255fd.plan.md b/.cursor/plans/glm-fixe-a86255fd.plan.md
@@ -0,0 +1,141 @@
+<!-- a86255fd-e559-425e-82e2-e2ad35c50047 460c8778-a2e6-429c-924f-1f88709080a0 -->
+# Extend Fixed Effects Support to feglm (Logit/Binomial Family)
+
+## Current State - Gaussian Implementation Complete
+
+Fixed effects support for Gaussian GLMs is **fully working**:
+
+- ✅ Backend infrastructure in place (`demeaner_backend` parameter)
+- ✅ Proper variance calculation using `self._df_t`
+- ✅ All tests passing (30/30): internal, R comparison, and error handling
+- ✅ Results match R's `fixest::feglm(family=gaussian())` exactly
+
+**Restriction in place** (line 102-103 in `feglm_.py`):
+
+```python
+if self._fe is not None and self._method != "feglm-gaussian":
+    raise NotImplementedError("Fixed effects are not yet supported for GLMs.")
+```
+
+## Logit Family Structure
+
+The `Felogit` class exists in `felogit_.py` and:
+
+- Inherits from `Feglm` (like `Fegaussian`)
+- Implements binomial family with logit link
+- Has `_method = "feglm-logit"`
+- Uses `_vcov_iid()` from `Feglm` (returns `self._bread`, no dispersion multiplier needed)
+- **Missing**: `demeaner_backend` parameter (not passed to parent)
+
+**Key difference from Gaussian**: Logit uses dispersion φ = 1.0, so no sigma² multiplier in variance calculation (already correct in base `Feglm._vcov_iid()`).
+
+## Implementation Steps
+
+### 1. Update Felogit to Accept demeaner_backend Parameter
+
+Modify `pyfixest/estimation/felogit_.py`:
+
+**a) Add import** (after line 8):
+
+```python
+from pyfixest.estimation.literals import DemeanerBackendOptions
+```
+
+**b) Add parameter to `__init__`** (after line 35):
+
+```python
+demeaner_backend: DemeanerBackendOptions = "numba",
+```
+
+**c) Pass to parent `super().__init__`** (after line 57):
+
+```python
+demeaner_backend=demeaner_backend,
+```
+
+### 2. Update Fixed Effects Restriction in feglm_.py
+
+Modify line 102 in `pyfixest/estimation/feglm_.py`:
+
+**Change from:**
+
+```python
+if self._fe is not None and self._method != "feglm-gaussian":
+```
+
+**To:**
+
+```python
+if self._fe is not None and self._method not in ["feglm-gaussian", "feglm-logit"]:
+```
+
+This allows both Gaussian and Logit, while still blocking Probit.
+
+### 3. Add Tests for Logit with Fixed Effects
+
+**a) Update test formulas in `test_vs_fixest.py`:**
+
+Currently `glm_fmls_with_fe` is only used for Gaussian. Update the test to include logit:
+
+**Modify `test_glm_with_fe_vs_fixest`** (around line 901):
+
+- Add `@pytest.mark.parametrize("family", ["gaussian", "logit"])`
+- Update R model creation to handle both families:
+  ```python
+  if family == "gaussian":
+      fit_r = fixest.feglm(ro.Formula(r_fml), data=data_r, family=stats.gaussian(), vcov=r_inference)
+  elif family == "logit":
+      fit_r = fixest.feglm(ro.Formula(r_fml), data=data_r, family=stats.binomial(link="logit"), vcov=r_inference)
+  ```
+
+
+**b) Update `test_errors.py`:**
+
+Modify `test_glm_errors` (around line 820):
+
+- Update to test that logit now allows fixed effects
+- Keep probit blocked
+
+### 4. Validation
+
+Run tests to ensure:
+
+- Logit with single FE: `Y ~ X1 | f1` ✓
+- Logit with multiple FE: `Y ~ X1 | f1 + f2` ✓
+- Logit with multiple covariates: `Y ~ X1 + X2 | f2` ✓
+- All inference types: `iid`, `hetero`, `CRV1` ✓
+- Results match R's `fixest::feglm(family=binomial(link="logit"))` ✓
+- IRLS converges properly with fixed effects
+
+## Key Files to Modify
+
+- `pyfixest/estimation/felogit_.py` (add demeaner_backend parameter)
+- `pyfixest/estimation/feglm_.py` (update restriction to include logit)
+- `tests/test_vs_fixest.py` (extend tests to logit family)
+- `tests/test_errors.py` (update error test for logit)
+
+## Expected Behavior
+
+After implementation:
+
+1. `pf.feglm(fml="Y ~ X1 | f1", family="logit", demeaner_backend="rust")` works
+2. Results match R's `fixest::feglm(family=binomial(link="logit"))`
+3. IRLS with fixed effects converges properly
+4. Variance calculation uses `self._bread` (no dispersion multiplier, φ=1)
+5. Probit remains blocked (for now)
+
+## Notes
+
+- Logit doesn't need a custom `_vcov_iid()` method (unlike Gaussian) because dispersion φ = 1.0
+- The IRLS algorithm in `Feglm.get_fit()` already handles demeaning at each iteration via `residualize()`
+- Main differences from Gaussian: link function, variance function, and convergence behavior
+- Separation checks are already in place and conditional on fixed effects
+
+### To-dos
+
+- [ ] Add DemeanerBackendOptions import to felogit_.py
+- [ ] Add demeaner_backend parameter to Felogit.__init__ and pass to parent
+- [ ] Update feglm_.py restriction to allow both gaussian and logit families
+- [ ] Extend test_glm_with_fe_vs_fixest to test both gaussian and logit families
+- [ ] Update test_glm_errors to allow logit with fixed effects
+- [ ] Run tests and validate logit with FE matches R fixest results
diff --git a/.cursor/plans/glm-fixed-effects-support-5e7c976a.plan.md b/.cursor/plans/glm-fixed-effects-support-5e7c976a.plan.md
@@ -0,0 +1,83 @@
+<!-- 5e7c976a-5d57-4559-9eec-e537dca963a1 9818a1cd-a593-4f12-a9f7-6caba18c5fcb -->
+# Add Fixed Effects Support to feglm (Gaussian Family)
+
+## Current State
+
+The `Feglm` class (in `feglm_.py`) currently raises `NotImplementedError` at line 99-100 when fixed effects are present. However, the infrastructure is largely in place:
+
+- The `residualize()` method exists (lines 304-323) and correctly calls `demean()`
+- Fixed effects are already handled in `prepare_model_matrix()` and `to_array()` methods
+- Separation checks (lines 102-126) are already conditional on `self._fe is not None`
+
+The `Fepois` class successfully implements fixed effects in its IRLS algorithm (lines 282-294 in `fepois_.py`) by calling `demean()` with fixed effects and weights at each iteration.
+
+## Implementation Steps
+
+### 1. Remove Fixed Effects Restriction in feglm_.py
+
+Remove the restriction in `prepare_model_matrix()` method:
+
+- Delete lines 99-100 that raise `NotImplementedError` for fixed effects
+- This will allow the rest of the code to execute when fixed effects are present
+
+### 2. Verify residualize() Method Integration
+
+The `residualize()` method (lines 304-323) already properly handles fixed effects:
+
+- Returns unchanged data when `flist is None` (no fixed effects)
+- Calls `demean()` with weights when fixed effects exist
+- This method is already called in `get_fit()` at line 182
+
+The implementation should work as-is since:
+
+- `Fegaussian` inherits from `Feglm`
+- The IRLS algorithm in `get_fit()` calls `residualize()` at each iteration (line 182)
+- Weights are properly passed as `W_tilde.flatten()` (line 186)
+
+### 3. Test the Implementation
+
+Enable and extend existing tests:
+
+**a) Enable skipped test in `test_feols_feglm_internally.py`:**
+
+- Remove the `@pytest.mark.skip` decorator on line 60
+- Run the test to verify it passes with the implementation
+
+**b) Add basic R comparison tests:**
+
+- Add tests to `test_vs_fixest.py` comparing `pf.feglm(family="gaussian")` against R's `fixest::feglm(family="gaussian")`
+- Test formulas: `"Y ~ X1 | f1"`, `"Y ~ X1 | f1 + f2"`, `"Y ~ X1 + X2 | f2"`
+- Compare: coefficients, standard errors, residuals
+- Use existing test patterns from the `glm_fmls` tests (around line 119)
+
+### 4. Validation Checklist
+
+Ensure the implementation correctly handles:
+
+- Single fixed effect: `Y ~ X1 | f1`
+- Multiple fixed effects: `Y ~ X1 | f1 + f2`
+- Fixed effects with multiple covariates: `Y ~ X1 + X2 | f1`
+- Proper weight handling in demeaning at each IRLS iteration
+- Convergence behavior (should still converge in 1 iteration for Gaussian)
+
+## Key Files to Modify
+
+- `pyfixest/estimation/feglm_.py` (remove restriction)
+- `tests/test_feols_feglm_internally.py` (enable skipped test)
+- `tests/test_vs_fixest.py` (add R comparison tests)
+
+## Expected Behavior
+
+After implementation, `pf.feglm(fml="Y ~ X1 | f1", family="gaussian")` should:
+
+1. Accept the formula without raising `NotImplementedError`
+2. Demean variables by fixed effects at each IRLS iteration
+3. Produce identical results to R's `fixest::feglm()` with Gaussian family
+4. For Gaussian family, should effectively match `feols()` results (with appropriate inference adjustments)
+
+### To-dos
+
+- [ ] Remove NotImplementedError for fixed effects in feglm_.py (lines 99-100)
+- [ ] Remove @pytest.mark.skip decorator from test_feols_feglm_internally
+- [ ] Add R comparison tests for Gaussian GLM with fixed effects to test_vs_fixest.py
+- [ ] Run tests and validate against R fixest results
diff --git a/.cursor/rules/pixi.mdc b/.cursor/rules/pixi.mdc
@@ -0,0 +1,12 @@
+---
+description: Running unit tests
+alwaysApply: false
+---
+When running unit tests, always:
+
+* use pixi to run the test.  We have a dev environement called "dev" that should house all dependencies.
+
+When running unit tests, never:
+
+* Alter the tolerance to get the test to pass.  Use a default tolerance of 1e-06 when comparing two floats.
+* Alter the test to make them easier to pass, unless explicitly told so.
diff --git a/FIX_SUMMARY.md b/FIX_SUMMARY.md
@@ -0,0 +1,109 @@
+# Fix Summary: feglm Gaussian Standard Errors with Fixed Effects
+
+## Problem
+The test `test_glm_with_fe_vs_fixest` was failing because `pf.feglm()` with `family="gaussian"` and fixed effects produced **incorrect standard errors** (~9% too large initially).
+
+## Root Causes Identified
+
+### 1. Incorrect Residual Computation
+**File**: `pyfixest/estimation/feglm_.py:230-252`
+
+**Issue**: Residuals were computed as `Y_original - predictions_demeaned`, mixing original and demeaned spaces.
+
+**Fix**: For Gaussian GLM with fixed effects, demean Y and compute residuals as `Y_demeaned - X_demeaned @ beta`.
+
+### 2. Incorrect Sigma² Denominator  
+**File**: `pyfixest/estimation/fegaussian_.py:104`
+
+**Issue**: Used `N` as denominator instead of `df_t` (degrees of freedom).
+
+**Fix**: Changed to use `df_t` to match `feols` behavior.
+
+### 3. Incorrect Scores Computation
+**File**: `pyfixest/estimation/feglm_.py:247-249`
+
+**Issue**: Scores used original X with demeaned residuals: `scores = u_hat_demeaned * X_original`.
+
+**Fix**: For Gaussian GLM with fixed effects, use demeaned X: `scores = u_hat_demeaned * X_demeaned`.
+
+## Changes Made
+
+### 1. pyfixest/estimation/feglm_.py (lines 230-263)
+
+```python
+if self._method == "feglm-gaussian" and self._fe is not None:
+    # For Gaussian with identity link and fixed effects,
+    # residuals must be computed in the demeaned space to match feols.
+    # Demean Y and compute residuals as Y_demeaned - X_demeaned @ beta
+    y_demeaned, _ = self.residualize(
+        v=self._Y,
+        X=np.zeros((self._N, 0)),  # Just demean Y, no X needed
+        flist=self._fe,
+        weights=W_tilde.flatten(),
+        tol=self._fixef_tol,
+        maxiter=self._fixef_maxiter,
+    )
+    # Residuals in demeaned space
+    self._u_hat_response = (y_demeaned.flatten() - X_dotdot @ beta).flatten()
+    self._u_hat_working = self._u_hat_response
+
+    # For sandwich variance, scores must also use demeaned X
+    self._scores_response = self._u_hat_response[:, None] * X_dotdot
+    self._scores_working = self._u_hat_working[:, None] * X_dotdot
+    self._scores = self._scores_response  # Use response scores for Gaussian
+elif self._method == "feglm-gaussian":
+    # Gaussian without fixed effects
+    self._u_hat_response = (self._Y.flatten() - self._get_mu(theta=eta)).flatten()
+    self._u_hat_working = self._u_hat_response
+    self._scores_response = self._u_hat_response[:, None] * self._X
+    self._scores_working = self._u_hat_working[:, None] * self._X
+    self._scores = self._get_score(y=self._Y.flatten(), X=self._X, mu=mu, eta=eta)
+else:
+    # For other GLM families
+    self._u_hat_response = (self._Y.flatten() - self._get_mu(theta=eta)).flatten()
+    self._u_hat_working = (v_dotdot / W_tilde).flatten()
+    self._scores_response = self._u_hat_response[:, None] * self._X
+    self._scores_working = self._u_hat_working[:, None] * self._X
+    self._scores = self._get_score(y=self._Y.flatten(), X=self._X, mu=mu, eta=eta)
+```
+
+### 2. pyfixest/estimation/fegaussian_.py (lines 100-107)
+
+```python
+def _vcov_iid(self):
+    _u_hat = self._u_hat
+    _bread = self._bread
+    # Use df_t (degrees of freedom) for denominator, matching feols behavior
+    sigma2 = np.sum(_u_hat.flatten() ** 2) / self._df_t
+    _vcov = _bread * sigma2
+
+    return _vcov
+```
+
+## Test Results
+
+**Before Fix**:
+- Python SE = 0.2596
+- R SE = 0.2379  
+- Difference = 9%
+- Test status: **FAILED**
+
+**After Fix**:
+- Python SE = 0.2379 (IID), 0.2440 (hetero)
+- R SE = 0.2379 (IID), 0.2440 (hetero)
+- Difference = < 0.0001%
+- Test status: **18/18 PASSED** ✓
+
+## Impact
+
+- ✅ IID standard errors now match R fixest exactly
+- ✅ Heteroskedastic standard errors now match R fixest exactly  
+- ✅ Clustered standard errors now match R fixest exactly
+- ✅ All formulas tested (single FE, two-way FE, multiple covariates)
+- ✅ Both dropna=True and dropna=False cases pass
+
+## Notes
+
+- Fix only applies to Gaussian family with fixed effects
+- Other GLM families (logit, probit) with fixed effects not yet supported (raise NotImplementedError)
+- Gaussian without fixed effects unchanged (already correct)
diff --git a/pyfixest/estimation/fegaussian_.py b/pyfixest/estimation/fegaussian_.py
@@ -6,6 +6,7 @@
 
 from pyfixest.estimation.feglm_ import Feglm
 from pyfixest.estimation.FormulaParser import FixestFormula
+from pyfixest.estimation.literals import DemeanerBackendOptions
 
 
 class Fegaussian(Feglm):
@@ -33,6 +34,7 @@ def __init__(
             "scipy.sparse.linalg.lsqr",
             "jax",
         ],
+        demeaner_backend: DemeanerBackendOptions = "numba",
         store_data: bool = True,
         copy_data: bool = True,
         lean: bool = False,
@@ -56,6 +58,7 @@ def __init__(
             tol=tol,
             maxiter=maxiter,
             solver=solver,
+            demeaner_backend=demeaner_backend,
             store_data=store_data,
             copy_data=copy_data,
             lean=lean,
@@ -98,3 +101,12 @@ def _get_score(
         self, y: np.ndarray, X: np.ndarray, mu: np.ndarray, eta: np.ndarray
     ) -> np.ndarray:
         return (y - mu)[:, None] * X
+
+    def _vcov_iid(self):
+        _u_hat = self._u_hat
+        _bread = self._bread
+        # Use df_t (degrees of freedom) for denominator, matching feols behavior
+        sigma2 = np.sum(_u_hat.flatten() ** 2) / self._df_t
+        _vcov = _bread * sigma2
+
+        return _vcov