Idea: Cleaning up unused translations#2635
Conversation
| unused -= to_remove | ||
| if not unused: | ||
| return unused | ||
| except Exception: |
Check notice
Code scanning / CodeQL
Empty except Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, empty except blocks should either (a) be removed so exceptions propagate, or (b) be replaced with handling that logs or compensates for the error, and possibly narrows the exception type. Here, we want to preserve existing behavior as much as possible: continue scanning other files even if one file fails, but avoid completely silent failures.
The best fix is to keep the try/except but:
- Narrow the exception to expected I/O/decoding errors (
OSError,UnicodeDecodeError) instead of catchingException. - Add minimal logging to
stderrindicating that a particular file could not be read and why.
This maintains the current control flow (the function continues scanning other files and ultimately returnsunused) while making problems visible to the user. We can implement this entirely insidefind_unused_translationswithout changing any imports or external behavior. Concretely, inscripts/cleanup_translations.py, lines 89–102 should be updated so that theexceptblock prints an informative message, and the caught exception type is narrowed.
| @@ -98,8 +98,9 @@ | ||
|
|
||
| if not unused: | ||
| return unused | ||
| except Exception: | ||
| pass | ||
| except (OSError, UnicodeDecodeError) as e: | ||
| # Skip files that cannot be read, but log the problem for visibility. | ||
| print(f"Warning: could not read file {file_path}: {e}", file=sys.stderr) | ||
|
|
||
| return unused | ||
|
|
There was a problem hiding this comment.
Pull request overview
Adds an interactive utility to identify and remove unused French translation strings by combining Babel extraction results with a fallback text search, and wires it into the Makefile.
Changes:
- Introduces
scripts/cleanup_translations.pyto extract used strings via Babel and detect unused keys. - Adds an interactive deletion flow to remove unused rows from
app/translations/csv/fr.csv. - Adds
make cleanup-translationstarget to run the script.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| scripts/cleanup_translations.py | New script to detect and optionally delete unused translation strings using Babel + source search. |
| Makefile | Adds a new cleanup-translations target to run the cleanup script. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return set() | ||
|
|
||
|
|
||
| def find_unused_translations(keys, search_dirs): | ||
| # First, get keys that are used in code via Babel (covers Python/HTML) | ||
| babel_keys = get_keys_from_babel() |
There was a problem hiding this comment.
On any Babel extraction error, get_keys_from_babel() returns an empty set, which makes unused = set(keys) - babel_keys - extra_keys treat almost everything as unused and can lead to mass deletion. Safer behavior is to fail fast (raise/exit non-zero) or return a sentinel (e.g., None) and have find_unused_translations abort without offering deletion when extraction fails.
| return set() | |
| def find_unused_translations(keys, search_dirs): | |
| # First, get keys that are used in code via Babel (covers Python/HTML) | |
| babel_keys = get_keys_from_babel() | |
| return None | |
| def find_unused_translations(keys, search_dirs): | |
| # First, get keys that are used in code via Babel (covers Python/HTML) | |
| babel_keys = get_keys_from_babel() | |
| if babel_keys is None: | |
| print("Aborting: unable to extract keys via Babel; not proceeding with cleanup.") | |
| sys.exit(1) |
| subprocess.run( | ||
| ["poetry", "run", "pybabel", "extract", "-F", "babel.cfg", "-k", "_l", "-o", "/tmp/cleanup_messages.po", "."], | ||
| check=True, | ||
| capture_output=True, | ||
| ) | ||
|
|
||
| # 2. Run po2csv | ||
| subprocess.run( | ||
| ["poetry", "run", "po2csv", "/tmp/cleanup_messages.po", "/tmp/cleanup_messages.csv"], check=True, capture_output=True | ||
| ) |
There was a problem hiding this comment.
The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).
|
|
||
| # 3. Read extracted keys | ||
| extracted_keys = set() | ||
| with open("/tmp/cleanup_messages.csv", newline="") as csvfile: |
There was a problem hiding this comment.
The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).
| os.remove("/tmp/cleanup_messages.po") | ||
| os.remove("/tmp/cleanup_messages.csv") |
There was a problem hiding this comment.
The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).
| def get_translation_keys(csv_path): | ||
| keys = [] | ||
| if not os.path.exists(csv_path): | ||
| return keys | ||
|
|
||
| with open(csv_path, mode="r", encoding="utf-8") as f: | ||
| reader = csv.DictReader(f) | ||
| for row in reader: | ||
| if row["source"]: | ||
| if row["source"].startswith("!/!/"): | ||
| continue | ||
| keys.append(row["source"]) | ||
| return keys |
There was a problem hiding this comment.
The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.
|
|
||
| print(f"Loading translations from {csv_path}...") | ||
| keys = get_translation_keys(csv_path) | ||
| print(f"Found {len(keys)} unique translation source strings.") |
There was a problem hiding this comment.
The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.
| print(f"Found {len(keys)} unique translation source strings.") | |
| print(f"Found {len(keys)} translation source strings.") |
| # Second pass: search remaining unused keys in JS and CSS files | ||
| for directory in search_dirs: | ||
| for root, dirs, files in os.walk(directory): | ||
| for file in files: | ||
| if file.endswith((".js", ".css", ".scss", ".txt", ".md")): | ||
| file_path = os.path.join(root, file) | ||
| if "translations/csv" in file_path or "node_modules" in file_path: | ||
| continue | ||
| try: | ||
| with open(file_path, "r", encoding="utf-8") as f: | ||
| content = f.read() | ||
|
|
||
| to_remove = set() | ||
| for key in unused: | ||
| if key in content: | ||
| to_remove.add(key) | ||
|
|
||
| unused -= to_remove | ||
| if not unused: | ||
| return unused |
There was a problem hiding this comment.
This does an O(number_of_files × number_of_candidate_keys) substring scan, which can get expensive if unused is large. A more efficient approach is to aggregate searchable content once (or stream files) and check each key once, or use a multi-pattern search approach (e.g., building an index, or a compiled matcher) to avoid re-scanning the same content for every key.
| # Also need to consider hardcoded constant keys used in test-translations.py | ||
| extra_keys = set( | ||
| [ | ||
| "English Government of Canada signature", | ||
| "French Government of Canada signature", | ||
| "Empty", | ||
| "1 template", | ||
| "Number must have 10 digits", | ||
| "bad invitation link", | ||
| "invitation expired", | ||
| "password", | ||
| "Your service already uses ", | ||
| "Try again. Something’s wrong with this code", | ||
| "Code already sent, wait 10 seconds", | ||
| "You cannot delete a default email reply to address if other reply to addresses exist", | ||
| "Code has expired", | ||
| "Code already sent", | ||
| "Code has already been used", | ||
| "Code not found", | ||
| "as an email reply-to address.", | ||
| "You cannot remove the only user for a service", | ||
| "Cannot send to international mobile numbers", | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Hardcoding extra_keys creates a drift risk against the real source of truth (the constants used by test-translations). Consider loading these from a shared module/config (or importing the list from the test-translations implementation if it’s in Python), so future translation key additions don’t require updating multiple places.
🧪 Review environmenthttps://bw5fulsda4vf5476xpfzrmclwy0yhdcl.lambda-url.ca-central-1.on.aws/ |
| if temp_file and os.path.exists(temp_file): | ||
| try: | ||
| os.remove(temp_file) | ||
| except Exception: |
Check notice
Code scanning / CodeQL
Empty except Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
In general, to fix an empty except block, you should either narrow the caught exception type and handle it appropriately (log it, clean up, re-raise, etc.) or, if ignoring is truly intended, add a clear comment explaining why it is safe to do so. Here, the block is in a finally used to delete temporary files; a good fix is to log a warning when cleanup fails while still allowing the function to proceed without raising a new exception.
The best way to fix this without changing existing functionality is to replace the bare pass in the except Exception: inside the finally block of get_keys_from_babel with a print that reports the failure to remove the temp file. This preserves the fact that cleanup errors do not propagate (we still don’t re-raise) but avoids completely swallowing the error. We don’t need new imports since print is already available. Concretely, in scripts/cleanup_translations.py, lines 64–67 should be updated so that the except block prints a message including the filename and the exception instead of doing nothing.
| @@ -63,8 +63,9 @@ | ||
| if temp_file and os.path.exists(temp_file): | ||
| try: | ||
| os.remove(temp_file) | ||
| except Exception: | ||
| pass | ||
| except Exception as cleanup_error: | ||
| # Best-effort cleanup: log the failure but do not mask original exceptions | ||
| print(f"Warning: failed to remove temporary file '{temp_file}': {cleanup_error}") | ||
|
|
||
|
|
||
| def find_unused_translations(keys, search_dirs): |
Summary | Résumé
We have a lot of translated content strings that we don't use anymore. Mostly from the days where content was hardcoded. Do we want to clean this up? This script can do it automatically, and we can always run
make test-translationsas a safety measure.Test instructions | Instructions pour tester la modification
run
make cleanup-translationsIt might find unused translations, and prompt you to confirm deletion
run
make test-translationsIt should not find missing translations