|
| 1 | +# Visual Illustration of the Conditions Mixing Problem |
| 2 | + |
| 3 | +## The Current Problem |
| 4 | + |
| 5 | +### What We Have Now (Mixed Concepts) |
| 6 | +``` |
| 7 | +_conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'} |
| 8 | + ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ |
| 9 | + [————————————————— all mixed together —————————————————] |
| 10 | +``` |
| 11 | + |
| 12 | +### What We Should Have (Separated Concepts) |
| 13 | + |
| 14 | +``` |
| 15 | +_control_flow_keywords = {'if', 'for', 'while', 'catch'} |
| 16 | + ↓ ↓ ↓ ↓ |
| 17 | + [— Control Flow Structures —] |
| 18 | +
|
| 19 | +_logical_operators = {'&&', '||'} |
| 20 | + ↓ ↓ |
| 21 | + [— Logical Ops —] |
| 22 | +
|
| 23 | +_case_keywords = {'case'} |
| 24 | + ↓ |
| 25 | + [— Case Labels —] |
| 26 | +
|
| 27 | +_ternary_operators = {'?'} |
| 28 | + ↓ |
| 29 | + [— Ternary —] |
| 30 | +``` |
| 31 | + |
| 32 | +## Real-World Examples |
| 33 | + |
| 34 | +### Example 1: Python Code |
| 35 | +```python |
| 36 | +def complex_function(x, y, z): |
| 37 | + if x > 0 and y > 0 or z < 10: # CCN: +3 (if, and, or) |
| 38 | + for i in range(10): # CCN: +1 (for) |
| 39 | + while i < 5: # CCN: +1 (while) |
| 40 | + try: |
| 41 | + process() |
| 42 | + except Exception: # CCN: +1 (except) |
| 43 | + pass |
| 44 | + return result if result else 0 # CCN: +0 (no '?' in Python) |
| 45 | +# Total CCN: 7 (1 base + 6) |
| 46 | +``` |
| 47 | + |
| 48 | +**Current Python `_conditions`:** |
| 49 | +```python |
| 50 | +_conditions = {'if', 'for', 'while', 'and', 'or', 'elif', 'except', 'finally'} |
| 51 | +# 🔀 🔀 🔀 🔗 🔗 🔀 🔀 🔀 |
| 52 | +``` |
| 53 | + |
| 54 | +**Problem**: All mixed together, extensions can't easily target just `and`/`or` |
| 55 | + |
| 56 | +### Example 2: C++ Code |
| 57 | +```cpp |
| 58 | +int complex_function(int x, int y) { |
| 59 | + if (x > 0 && y > 0 || x < -10) { // CCN: +4 (if, &&, ||, ||) |
| 60 | + switch(x) { // CCN: +0 (switch itself) |
| 61 | + case 1: // CCN: +1 (case) |
| 62 | + case 2: // CCN: +1 (case) |
| 63 | + case 3: // CCN: +1 (case) |
| 64 | + break; |
| 65 | + } |
| 66 | + } |
| 67 | + return (x > 0) ? y : -y; // CCN: +1 (ternary ?) |
| 68 | +} |
| 69 | +// Total CCN: 9 (1 base + 8) |
| 70 | +``` |
| 71 | +
|
| 72 | +**Current Base `_conditions`:** |
| 73 | +```python |
| 74 | +_conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'} |
| 75 | +# 🔀 🔀 🔀 🔗 🔗 ❓ 🔀 🔢 |
| 76 | +``` |
| 77 | + |
| 78 | +**Problem**: Can't distinguish case counting from other constructs |
| 79 | + |
| 80 | +## How Extensions Are Affected |
| 81 | + |
| 82 | +### Extension: lizardnonstrict.py |
| 83 | +**Goal**: Count complexity WITHOUT logical operators (only control flow) |
| 84 | + |
| 85 | +**Current Implementation (Hacky):** |
| 86 | +```python |
| 87 | +def __call__(self, tokens, reader): |
| 88 | + reader.conditions -= set(['&&', '||', 'and', 'or']) # Remove by hardcoding |
| 89 | + return tokens |
| 90 | +``` |
| 91 | + |
| 92 | +**Problems:** |
| 93 | +- Hard-coded list of operators |
| 94 | +- Must know all languages' logical operators |
| 95 | +- Easy to miss new operators |
| 96 | + |
| 97 | +**With Separation (Clean):** |
| 98 | +```python |
| 99 | +def __call__(self, tokens, reader): |
| 100 | + reader.conditions -= reader.logical_operators # Semantic! |
| 101 | + return tokens |
| 102 | +``` |
| 103 | + |
| 104 | +### Extension: lizardmccabe.py |
| 105 | +**Goal**: Count only FIRST case in switch (McCabe's definition) |
| 106 | + |
| 107 | +**Current Implementation (Complex State Machine):** |
| 108 | +```python |
| 109 | +def _after_a_case(self, token): |
| 110 | + if token == "case": |
| 111 | + self.context.add_condition(-1) # Subtract for consecutive cases |
| 112 | +``` |
| 113 | + |
| 114 | +**Problems:** |
| 115 | +- Needs complex state machine |
| 116 | +- Hard to distinguish 'case' from other conditions |
| 117 | +- Error-prone |
| 118 | + |
| 119 | +**With Separation (Simpler):** |
| 120 | +```python |
| 121 | +def __call__(self, tokens, reader): |
| 122 | + # Could potentially simplify by targeting case_keywords specifically |
| 123 | + # (implementation still needs state machine, but intent is clearer) |
| 124 | +``` |
| 125 | + |
| 126 | +## Language Variations Illustrated |
| 127 | + |
| 128 | +### Symbol-based Logical Operators |
| 129 | +``` |
| 130 | +C++/Java/C#: if (a && b || c) |
| 131 | +JavaScript: if (a && b || c) |
| 132 | +Kotlin: if (a && b || c) |
| 133 | +PHP: if ($a && $b || $c) |
| 134 | +``` |
| 135 | + |
| 136 | +### Word-based Logical Operators |
| 137 | +``` |
| 138 | +Python: if a and b or c: |
| 139 | +Ruby: if a and b or c |
| 140 | +Perl: if $a and $b or $c # Also has && || |
| 141 | +Fortran: IF (a .AND. b .OR. c) |
| 142 | +PL/SQL: IF a AND b OR c THEN |
| 143 | +``` |
| 144 | + |
| 145 | +### Mixed Operators (Dual Purpose) |
| 146 | +``` |
| 147 | +Perl: |
| 148 | + if $x && $y # Symbol form |
| 149 | + if $x and $y # Word form (lower precedence!) |
| 150 | +
|
| 151 | +R: |
| 152 | + if (a && b) # Short-circuit (scalar) |
| 153 | + result <- a & b # Element-wise (vectorized) ⚠️ |
| 154 | +``` |
| 155 | + |
| 156 | +## The R Language Bug in Detail |
| 157 | + |
| 158 | +### Current R `_conditions`: |
| 159 | +```python |
| 160 | +_conditions = { |
| 161 | + 'if', 'for', 'while', 'switch', # Control flow ✓ |
| 162 | + '&&', '||', # Short-circuit logical ✓ |
| 163 | + '&', '|', # Element-wise logical ⚠️ |
| 164 | + 'ifelse', 'tryCatch', 'try' # Functions ⚠️ |
| 165 | +} |
| 166 | +``` |
| 167 | + |
| 168 | +### The Problem: |
| 169 | +```r |
| 170 | +# This SHOULD add to CCN (control flow decision): |
| 171 | +if (x > 0 && y > 0) { ... } # CCN: +2 (if, &&) |
| 172 | + |
| 173 | +# This probably SHOULD NOT (vectorized operation): |
| 174 | +flags <- (x > 0) & (y > 0) # CCN: +1? (just &) |
| 175 | +# This is element-wise operation on vectors, not a control flow decision! |
| 176 | + |
| 177 | +# This is debatable (function call): |
| 178 | +result <- ifelse(x > 0, 1, -1) # CCN: +1? (ifelse) |
| 179 | +# Is a function call a control flow decision? |
| 180 | +``` |
| 181 | + |
| 182 | +### Why It's Confusing: |
| 183 | +- `&&` and `&` look similar but have different semantics |
| 184 | +- `&&` is short-circuit (control flow) |
| 185 | +- `&` is vectorized (data operation) |
| 186 | +- Currently both add +1 to CCN |
| 187 | + |
| 188 | +## Before and After Comparison |
| 189 | + |
| 190 | +### Before (Current State) |
| 191 | +```python |
| 192 | +# Base class |
| 193 | +class CodeReader: |
| 194 | + _conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'} |
| 195 | + |
| 196 | + def __init__(self, context): |
| 197 | + self.conditions = copy(self._conditions) # One big mixed set |
| 198 | + |
| 199 | +# Extension trying to remove logical operators |
| 200 | +class LizardNonStrict: |
| 201 | + def __call__(self, tokens, reader): |
| 202 | + reader.conditions -= set(['&&', '||', 'and', 'or']) # Hardcoded! |
| 203 | +``` |
| 204 | + |
| 205 | +**Issues:** |
| 206 | +- ❌ All concepts mixed |
| 207 | +- ❌ Extensions use hardcoded lists |
| 208 | +- ❌ Unclear what each token represents |
| 209 | +- ❌ Hard to maintain |
| 210 | + |
| 211 | +### After (Proposed State) |
| 212 | +```python |
| 213 | +# Base class |
| 214 | +class CodeReader: |
| 215 | + _control_flow_keywords = {'if', 'for', 'while', 'catch'} |
| 216 | + _logical_operators = {'&&', '||'} |
| 217 | + _case_keywords = {'case'} |
| 218 | + _ternary_operators = {'?'} |
| 219 | + |
| 220 | + def __init__(self, context): |
| 221 | + # Combine for backward compatibility |
| 222 | + self.conditions = (self._control_flow_keywords | |
| 223 | + self._logical_operators | |
| 224 | + self._case_keywords | |
| 225 | + self._ternary_operators) |
| 226 | + # Also expose separately |
| 227 | + self.control_flow_keywords = copy(self._control_flow_keywords) |
| 228 | + self.logical_operators = copy(self._logical_operators) |
| 229 | + self.case_keywords = copy(self._case_keywords) |
| 230 | + self.ternary_operators = copy(self._ternary_operators) |
| 231 | + |
| 232 | +# Extension using semantic names |
| 233 | +class LizardNonStrict: |
| 234 | + def __call__(self, tokens, reader): |
| 235 | + reader.conditions -= reader.logical_operators # Semantic! |
| 236 | +``` |
| 237 | + |
| 238 | +**Benefits:** |
| 239 | +- ✅ Clear separation of concepts |
| 240 | +- ✅ Extensions use semantic names |
| 241 | +- ✅ Self-documenting code |
| 242 | +- ✅ Easy to maintain |
| 243 | +- ✅ Backward compatible |
| 244 | + |
| 245 | +## Impact Summary |
| 246 | + |
| 247 | +### Files to Update |
| 248 | +``` |
| 249 | +Phase 1: Infrastructure |
| 250 | + ✏️ lizard_languages/code_reader.py (base class) |
| 251 | +
|
| 252 | +Phase 2-3: Language Readers (23 files) |
| 253 | + ✏️ lizard_languages/python.py |
| 254 | + ✏️ lizard_languages/javascript.py |
| 255 | + ✏️ lizard_languages/java.py |
| 256 | + ... (20 more) |
| 257 | +
|
| 258 | +Phase 4: Extensions (4 files) |
| 259 | + ✏️ lizard_ext/lizardnonstrict.py |
| 260 | + ✏️ lizard_ext/lizardmccabe.py |
| 261 | + ✏️ lizard_ext/lizardmodified.py |
| 262 | + ✏️ lizard_ext/lizardcomplextags.py (review) |
| 263 | +
|
| 264 | +Phase 5: Bug Fixes |
| 265 | + 🐛 Fix R language element-wise operators |
| 266 | + 🐛 Fix Rust incorrect 'case' keyword |
| 267 | + 🐛 Fix Erlang '?' meaning |
| 268 | + 🐛 Fix Perl duplicate definitions |
| 269 | +``` |
| 270 | + |
| 271 | +### Testing Impact |
| 272 | +``` |
| 273 | +✅ All existing tests must pass (backward compatible) |
| 274 | +✅ Add new tests for bug fixes |
| 275 | +✅ Extension tests must pass |
| 276 | +✅ Integration tests with real code |
| 277 | +``` |
| 278 | + |
| 279 | +### Documentation Impact |
| 280 | +``` |
| 281 | +📚 Update language implementation guide |
| 282 | +📚 Add migration guide for custom readers |
| 283 | +📚 Update theory documentation if needed |
| 284 | +``` |
| 285 | + |
| 286 | +## Conclusion |
| 287 | + |
| 288 | +This refactoring: |
| 289 | +- **Fixes conceptual confusion** by separating mixed concepts |
| 290 | +- **Enables better extensions** by providing semantic categorization |
| 291 | +- **Fixes real bugs** in R, Rust, Erlang, and Perl |
| 292 | +- **Maintains compatibility** with existing code |
| 293 | +- **Improves maintainability** for future development |
| 294 | + |
| 295 | +The implementation is straightforward and can be done incrementally with full test coverage. |
| 296 | + |
0 commit comments