Skip to content

Commit 0fffbfb

Browse files
committed
feat: Enhance TODO list with completed items and detailed refactoring phases
This update adds a section for completed tasks in the TODO list and outlines a comprehensive refactoring plan for handling conditions across various language readers. The plan is divided into multiple phases, detailing research, infrastructure changes, updates to language readers, testing, and documentation. This structured approach aims to improve the handling of logical operators and conditions in the codebase.
1 parent 592ae58 commit 0fffbfb

File tree

5 files changed

+1271
-1
lines changed

5 files changed

+1271
-1
lines changed
Lines changed: 296 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,296 @@
1+
# Visual Illustration of the Conditions Mixing Problem
2+
3+
## The Current Problem
4+
5+
### What We Have Now (Mixed Concepts)
6+
```
7+
_conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'}
8+
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
9+
[————————————————— all mixed together —————————————————]
10+
```
11+
12+
### What We Should Have (Separated Concepts)
13+
14+
```
15+
_control_flow_keywords = {'if', 'for', 'while', 'catch'}
16+
↓ ↓ ↓ ↓
17+
[— Control Flow Structures —]
18+
19+
_logical_operators = {'&&', '||'}
20+
↓ ↓
21+
[— Logical Ops —]
22+
23+
_case_keywords = {'case'}
24+
25+
[— Case Labels —]
26+
27+
_ternary_operators = {'?'}
28+
29+
[— Ternary —]
30+
```
31+
32+
## Real-World Examples
33+
34+
### Example 1: Python Code
35+
```python
36+
def complex_function(x, y, z):
37+
if x > 0 and y > 0 or z < 10: # CCN: +3 (if, and, or)
38+
for i in range(10): # CCN: +1 (for)
39+
while i < 5: # CCN: +1 (while)
40+
try:
41+
process()
42+
except Exception: # CCN: +1 (except)
43+
pass
44+
return result if result else 0 # CCN: +0 (no '?' in Python)
45+
# Total CCN: 7 (1 base + 6)
46+
```
47+
48+
**Current Python `_conditions`:**
49+
```python
50+
_conditions = {'if', 'for', 'while', 'and', 'or', 'elif', 'except', 'finally'}
51+
# 🔀 🔀 🔀 🔗 🔗 🔀 🔀 🔀
52+
```
53+
54+
**Problem**: All mixed together, extensions can't easily target just `and`/`or`
55+
56+
### Example 2: C++ Code
57+
```cpp
58+
int complex_function(int x, int y) {
59+
if (x > 0 && y > 0 || x < -10) { // CCN: +4 (if, &&, ||, ||)
60+
switch(x) { // CCN: +0 (switch itself)
61+
case 1: // CCN: +1 (case)
62+
case 2: // CCN: +1 (case)
63+
case 3: // CCN: +1 (case)
64+
break;
65+
}
66+
}
67+
return (x > 0) ? y : -y; // CCN: +1 (ternary ?)
68+
}
69+
// Total CCN: 9 (1 base + 8)
70+
```
71+
72+
**Current Base `_conditions`:**
73+
```python
74+
_conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'}
75+
# 🔀 🔀 🔀 🔗 🔗 ❓ 🔀 🔢
76+
```
77+
78+
**Problem**: Can't distinguish case counting from other constructs
79+
80+
## How Extensions Are Affected
81+
82+
### Extension: lizardnonstrict.py
83+
**Goal**: Count complexity WITHOUT logical operators (only control flow)
84+
85+
**Current Implementation (Hacky):**
86+
```python
87+
def __call__(self, tokens, reader):
88+
reader.conditions -= set(['&&', '||', 'and', 'or']) # Remove by hardcoding
89+
return tokens
90+
```
91+
92+
**Problems:**
93+
- Hard-coded list of operators
94+
- Must know all languages' logical operators
95+
- Easy to miss new operators
96+
97+
**With Separation (Clean):**
98+
```python
99+
def __call__(self, tokens, reader):
100+
reader.conditions -= reader.logical_operators # Semantic!
101+
return tokens
102+
```
103+
104+
### Extension: lizardmccabe.py
105+
**Goal**: Count only FIRST case in switch (McCabe's definition)
106+
107+
**Current Implementation (Complex State Machine):**
108+
```python
109+
def _after_a_case(self, token):
110+
if token == "case":
111+
self.context.add_condition(-1) # Subtract for consecutive cases
112+
```
113+
114+
**Problems:**
115+
- Needs complex state machine
116+
- Hard to distinguish 'case' from other conditions
117+
- Error-prone
118+
119+
**With Separation (Simpler):**
120+
```python
121+
def __call__(self, tokens, reader):
122+
# Could potentially simplify by targeting case_keywords specifically
123+
# (implementation still needs state machine, but intent is clearer)
124+
```
125+
126+
## Language Variations Illustrated
127+
128+
### Symbol-based Logical Operators
129+
```
130+
C++/Java/C#: if (a && b || c)
131+
JavaScript: if (a && b || c)
132+
Kotlin: if (a && b || c)
133+
PHP: if ($a && $b || $c)
134+
```
135+
136+
### Word-based Logical Operators
137+
```
138+
Python: if a and b or c:
139+
Ruby: if a and b or c
140+
Perl: if $a and $b or $c # Also has && ||
141+
Fortran: IF (a .AND. b .OR. c)
142+
PL/SQL: IF a AND b OR c THEN
143+
```
144+
145+
### Mixed Operators (Dual Purpose)
146+
```
147+
Perl:
148+
if $x && $y # Symbol form
149+
if $x and $y # Word form (lower precedence!)
150+
151+
R:
152+
if (a && b) # Short-circuit (scalar)
153+
result <- a & b # Element-wise (vectorized) ⚠️
154+
```
155+
156+
## The R Language Bug in Detail
157+
158+
### Current R `_conditions`:
159+
```python
160+
_conditions = {
161+
'if', 'for', 'while', 'switch', # Control flow ✓
162+
'&&', '||', # Short-circuit logical ✓
163+
'&', '|', # Element-wise logical ⚠️
164+
'ifelse', 'tryCatch', 'try' # Functions ⚠️
165+
}
166+
```
167+
168+
### The Problem:
169+
```r
170+
# This SHOULD add to CCN (control flow decision):
171+
if (x > 0 && y > 0) { ... } # CCN: +2 (if, &&)
172+
173+
# This probably SHOULD NOT (vectorized operation):
174+
flags <- (x > 0) & (y > 0) # CCN: +1? (just &)
175+
# This is element-wise operation on vectors, not a control flow decision!
176+
177+
# This is debatable (function call):
178+
result <- ifelse(x > 0, 1, -1) # CCN: +1? (ifelse)
179+
# Is a function call a control flow decision?
180+
```
181+
182+
### Why It's Confusing:
183+
- `&&` and `&` look similar but have different semantics
184+
- `&&` is short-circuit (control flow)
185+
- `&` is vectorized (data operation)
186+
- Currently both add +1 to CCN
187+
188+
## Before and After Comparison
189+
190+
### Before (Current State)
191+
```python
192+
# Base class
193+
class CodeReader:
194+
_conditions = {'if', 'for', 'while', '&&', '||', '?', 'catch', 'case'}
195+
196+
def __init__(self, context):
197+
self.conditions = copy(self._conditions) # One big mixed set
198+
199+
# Extension trying to remove logical operators
200+
class LizardNonStrict:
201+
def __call__(self, tokens, reader):
202+
reader.conditions -= set(['&&', '||', 'and', 'or']) # Hardcoded!
203+
```
204+
205+
**Issues:**
206+
- ❌ All concepts mixed
207+
- ❌ Extensions use hardcoded lists
208+
- ❌ Unclear what each token represents
209+
- ❌ Hard to maintain
210+
211+
### After (Proposed State)
212+
```python
213+
# Base class
214+
class CodeReader:
215+
_control_flow_keywords = {'if', 'for', 'while', 'catch'}
216+
_logical_operators = {'&&', '||'}
217+
_case_keywords = {'case'}
218+
_ternary_operators = {'?'}
219+
220+
def __init__(self, context):
221+
# Combine for backward compatibility
222+
self.conditions = (self._control_flow_keywords |
223+
self._logical_operators |
224+
self._case_keywords |
225+
self._ternary_operators)
226+
# Also expose separately
227+
self.control_flow_keywords = copy(self._control_flow_keywords)
228+
self.logical_operators = copy(self._logical_operators)
229+
self.case_keywords = copy(self._case_keywords)
230+
self.ternary_operators = copy(self._ternary_operators)
231+
232+
# Extension using semantic names
233+
class LizardNonStrict:
234+
def __call__(self, tokens, reader):
235+
reader.conditions -= reader.logical_operators # Semantic!
236+
```
237+
238+
**Benefits:**
239+
- ✅ Clear separation of concepts
240+
- ✅ Extensions use semantic names
241+
- ✅ Self-documenting code
242+
- ✅ Easy to maintain
243+
- ✅ Backward compatible
244+
245+
## Impact Summary
246+
247+
### Files to Update
248+
```
249+
Phase 1: Infrastructure
250+
✏️ lizard_languages/code_reader.py (base class)
251+
252+
Phase 2-3: Language Readers (23 files)
253+
✏️ lizard_languages/python.py
254+
✏️ lizard_languages/javascript.py
255+
✏️ lizard_languages/java.py
256+
... (20 more)
257+
258+
Phase 4: Extensions (4 files)
259+
✏️ lizard_ext/lizardnonstrict.py
260+
✏️ lizard_ext/lizardmccabe.py
261+
✏️ lizard_ext/lizardmodified.py
262+
✏️ lizard_ext/lizardcomplextags.py (review)
263+
264+
Phase 5: Bug Fixes
265+
🐛 Fix R language element-wise operators
266+
🐛 Fix Rust incorrect 'case' keyword
267+
🐛 Fix Erlang '?' meaning
268+
🐛 Fix Perl duplicate definitions
269+
```
270+
271+
### Testing Impact
272+
```
273+
✅ All existing tests must pass (backward compatible)
274+
✅ Add new tests for bug fixes
275+
✅ Extension tests must pass
276+
✅ Integration tests with real code
277+
```
278+
279+
### Documentation Impact
280+
```
281+
📚 Update language implementation guide
282+
📚 Add migration guide for custom readers
283+
📚 Update theory documentation if needed
284+
```
285+
286+
## Conclusion
287+
288+
This refactoring:
289+
- **Fixes conceptual confusion** by separating mixed concepts
290+
- **Enables better extensions** by providing semantic categorization
291+
- **Fixes real bugs** in R, Rust, Erlang, and Perl
292+
- **Maintains compatibility** with existing code
293+
- **Improves maintainability** for future development
294+
295+
The implementation is straightforward and can be done incrementally with full test coverage.
296+

0 commit comments

Comments
 (0)