|
| 1 | +<!-- bdcc8fc1-5a9f-4b9b-a057-5e06b697beac 0568a9a9-d98d-469d-ace5-dd2c44b3f5c5 --> |
| 2 | +# Robust Dependency Extraction System |
| 3 | + |
| 4 | +## Overview |
| 5 | + |
| 6 | +Transform the dependency extraction script to be resilient against repo structure changes through configuration-based sources, file discovery, validation, and comprehensive maintenance documentation. |
| 7 | + |
| 8 | +## Implementation Steps |
| 9 | + |
| 10 | +### 1. Create Configuration File (`config.yaml`) |
| 11 | + |
| 12 | +Create `scripts_extract_dependencies/config.yaml` with: |
| 13 | + |
| 14 | +- Component definitions (trtllm, vllm, sglang, operator, shared) |
| 15 | +- Source file patterns using glob patterns and fallback locations |
| 16 | +- Baseline dependency count |
| 17 | +- GitHub repository settings |
| 18 | + |
| 19 | +Structure: |
| 20 | + |
| 21 | +```yaml |
| 22 | +github: |
| 23 | + repo: "ai-dynamo/dynamo" |
| 24 | + branch: "main" |
| 25 | + |
| 26 | +baseline: |
| 27 | + dependency_count: 251 |
| 28 | + |
| 29 | +components: |
| 30 | + trtllm: |
| 31 | + dockerfiles: |
| 32 | + - "container/Dockerfile.trtllm" |
| 33 | + - "containers/Dockerfile.trtllm" # fallback |
| 34 | + scripts: [] |
| 35 | + |
| 36 | + vllm: |
| 37 | + dockerfiles: |
| 38 | + - "container/Dockerfile.vllm" |
| 39 | + scripts: |
| 40 | + - "container/deps/vllm/install_vllm.sh" |
| 41 | + |
| 42 | + sglang: |
| 43 | + dockerfiles: |
| 44 | + - "container/Dockerfile.sglang" |
| 45 | + |
| 46 | + operator: |
| 47 | + dockerfiles: |
| 48 | + - "deploy/cloud/operator/Dockerfile" |
| 49 | + go_modules: |
| 50 | + - "deploy/cloud/operator/go.mod" |
| 51 | + |
| 52 | + shared: |
| 53 | + dockerfiles: |
| 54 | + - "container/Dockerfile" |
| 55 | + requirements: |
| 56 | + - pattern: "container/deps/requirements*.txt" |
| 57 | + exclude: [] |
| 58 | + pyproject: |
| 59 | + - "pyproject.toml" |
| 60 | + - "benchmarks/pyproject.toml" |
| 61 | +``` |
| 62 | +
|
| 63 | +### 2. Add Configuration Loader |
| 64 | +
|
| 65 | +Modify `extract_dependency_versions.py`: |
| 66 | + |
| 67 | +- Add `load_config()` method to DependencyExtractor class |
| 68 | +- Support YAML parsing (add pyyaml to dependencies if not present, or use json as fallback) |
| 69 | +- Validate configuration structure |
| 70 | +- Merge CLI args with config file settings |
| 71 | + |
| 72 | +### 3. Implement File Discovery |
| 73 | + |
| 74 | +Add new methods to DependencyExtractor: |
| 75 | + |
| 76 | +- `discover_files(patterns: List[str]) -> List[Path]`: Find files matching patterns with fallbacks |
| 77 | +- `validate_critical_files() -> Dict[str, bool]`: Check if critical files exist |
| 78 | +- `find_file_alternatives(base_pattern: str) -> Optional[Path]`: Try common variations |
| 79 | + |
| 80 | +Update `extract_all()` to: |
| 81 | + |
| 82 | +- Use config-driven file discovery instead of hardcoded paths |
| 83 | +- Try multiple location patterns before failing |
| 84 | +- Report missing files with suggestions |
| 85 | +- Continue processing other components even if one fails |
| 86 | + |
| 87 | +### 4. Enhanced Error Handling |
| 88 | + |
| 89 | +Add comprehensive error tracking: |
| 90 | + |
| 91 | +- Track missing files separately from extraction errors |
| 92 | +- Collect warnings for unversioned dependencies |
| 93 | +- Generate summary report of extraction success/failures |
| 94 | +- Add `--strict` mode that fails on missing files vs. warning mode (default) |
| 95 | + |
| 96 | +Add new summary sections: |
| 97 | + |
| 98 | +``` |
| 99 | +Extraction Summary: |
| 100 | + Files Processed: 15/18 |
| 101 | + Files Missing: 3 |
| 102 | + - container/deps/requirements.standard.txt (optional) |
| 103 | + - ... |
| 104 | + Components: |
| 105 | + trtllm: ✓ Complete |
| 106 | + vllm: ⚠ Partial (missing install script) |
| 107 | + ... |
| 108 | +``` |
| 109 | + |
| 110 | +### 5. Create Maintenance Documentation |
| 111 | + |
| 112 | +Create `scripts_extract_dependencies/MAINTENANCE.md`: |
| 113 | + |
| 114 | +**Sections:** |
| 115 | + |
| 116 | +- How to add new components (step-by-step) |
| 117 | +- How to add new file types (requirements, dockerfiles, etc.) |
| 118 | +- How to update file paths when repo structure changes |
| 119 | +- How to update extraction patterns for new file formats |
| 120 | +- Troubleshooting guide for common issues |
| 121 | +- Config file reference documentation |
| 122 | +- How to update baseline count |
| 123 | +- Testing checklist before committing changes |
| 124 | + |
| 125 | +### 6. Add Validation & Testing |
| 126 | + |
| 127 | +Add `--validate` mode: |
| 128 | + |
| 129 | +- Check config file syntax |
| 130 | +- Verify all configured paths exist |
| 131 | +- Test extraction patterns without writing output |
| 132 | +- Report configuration issues |
| 133 | + |
| 134 | +Add `--dry-run` mode: |
| 135 | + |
| 136 | +- Show what files would be processed |
| 137 | +- Display discovered files |
| 138 | +- Skip actual extraction |
| 139 | + |
| 140 | +### 7. Update README |
| 141 | + |
| 142 | +Update `scripts_extract_dependencies/README.md`: |
| 143 | + |
| 144 | +- Add section on configuration file |
| 145 | +- Document file discovery behavior |
| 146 | +- Explain how to handle missing files |
| 147 | +- Add troubleshooting section |
| 148 | +- Link to MAINTENANCE.md |
| 149 | +- Add examples for common maintenance tasks |
| 150 | + |
| 151 | +### 8. Add Version Detection Improvements |
| 152 | + |
| 153 | +Enhance extraction methods: |
| 154 | + |
| 155 | +- Better regex patterns for version strings |
| 156 | +- Support more version specifier formats (>= , ~=, ^, etc.) |
| 157 | +- Extract versions from comments if present |
| 158 | +- Add heuristics to guess versions from Git tags/branches when "latest" is used |
| 159 | + |
| 160 | +## Files to Create/Modify |
| 161 | + |
| 162 | +**New Files:** |
| 163 | + |
| 164 | +- `scripts_extract_dependencies/config.yaml` - Configuration |
| 165 | +- `scripts_extract_dependencies/MAINTENANCE.md` - Maintenance guide |
| 166 | + |
| 167 | +**Modified Files:** |
| 168 | + |
| 169 | +- `scripts_extract_dependencies/extract_dependency_versions.py` - Add config loading, discovery, validation |
| 170 | +- `scripts_extract_dependencies/README.md` - Add config documentation, update examples |
| 171 | + |
| 172 | +## Expected Outcomes |
| 173 | + |
| 174 | +After implementation: |
| 175 | + |
| 176 | +1. Script survives file moves - uses discovery patterns |
| 177 | +2. Easy to add new components - edit config.yaml |
| 178 | +3. Clear error messages - shows what's missing and where to look |
| 179 | +4. Maintainable - documentation guides future updates |
| 180 | +5. Validated - catches config errors before extraction |
| 181 | +6. Flexible - multiple fallback locations, graceful degradation |
| 182 | + |
| 183 | +### To-dos |
| 184 | + |
| 185 | +- [ ] Create config.yaml with component definitions, file patterns, and settings |
| 186 | +- [ ] Add configuration loading and validation to DependencyExtractor class |
| 187 | +- [ ] Implement file discovery with glob patterns and fallback locations |
| 188 | +- [ ] Add comprehensive error tracking and reporting with strict/warning modes |
| 189 | +- [ ] Create MAINTENANCE.md with guides for adding components, updating paths, troubleshooting |
| 190 | +- [ ] Add --validate and --dry-run modes for testing configuration |
| 191 | +- [ ] Update README.md with configuration documentation and troubleshooting |
| 192 | +- [ ] Enhance version extraction with better patterns and heuristics |
0 commit comments