Skip to content

Idea: Cleaning up unused translations#2635

Open
amazingphilippe wants to merge 3 commits intomainfrom
debt/unused-translation-script
Open

Idea: Cleaning up unused translations#2635
amazingphilippe wants to merge 3 commits intomainfrom
debt/unused-translation-script

Conversation

@amazingphilippe
Copy link
Copy Markdown
Contributor

Summary | Résumé

We have a lot of translated content strings that we don't use anymore. Mostly from the days where content was hardcoded. Do we want to clean this up? This script can do it automatically, and we can always run make test-translations as a safety measure.

Test instructions | Instructions pour tester la modification

run make cleanup-translations
It might find unused translations, and prompt you to confirm deletion
run make test-translations
It should not find missing translations

unused -= to_remove
if not unused:
return unused
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 1 month ago

In general, empty except blocks should either (a) be removed so exceptions propagate, or (b) be replaced with handling that logs or compensates for the error, and possibly narrows the exception type. Here, we want to preserve existing behavior as much as possible: continue scanning other files even if one file fails, but avoid completely silent failures.

The best fix is to keep the try/except but:

  • Narrow the exception to expected I/O/decoding errors (OSError, UnicodeDecodeError) instead of catching Exception.
  • Add minimal logging to stderr indicating that a particular file could not be read and why.
    This maintains the current control flow (the function continues scanning other files and ultimately returns unused) while making problems visible to the user. We can implement this entirely inside find_unused_translations without changing any imports or external behavior. Concretely, in scripts/cleanup_translations.py, lines 89–102 should be updated so that the except block prints an informative message, and the caught exception type is narrowed.
Suggested changeset 1
scripts/cleanup_translations.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/cleanup_translations.py b/scripts/cleanup_translations.py
--- a/scripts/cleanup_translations.py
+++ b/scripts/cleanup_translations.py
@@ -98,8 +98,9 @@
 
                             if not unused:
                                 return unused
-                    except Exception:
-                        pass
+                    except (OSError, UnicodeDecodeError) as e:
+                        # Skip files that cannot be read, but log the problem for visibility.
+                        print(f"Warning: could not read file {file_path}: {e}", file=sys.stderr)
 
     return unused
 
EOF
@@ -98,8 +98,9 @@

if not unused:
return unused
except Exception:
pass
except (OSError, UnicodeDecodeError) as e:
# Skip files that cannot be read, but log the problem for visibility.
print(f"Warning: could not read file {file_path}: {e}", file=sys.stderr)

return unused

Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an interactive utility to identify and remove unused French translation strings by combining Babel extraction results with a fallback text search, and wires it into the Makefile.

Changes:

  • Introduces scripts/cleanup_translations.py to extract used strings via Babel and detect unused keys.
  • Adds an interactive deletion flow to remove unused rows from app/translations/csv/fr.csv.
  • Adds make cleanup-translations target to run the script.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 8 comments.

File Description
scripts/cleanup_translations.py New script to detect and optionally delete unused translation strings using Babel + source search.
Makefile Adds a new cleanup-translations target to run the cleanup script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/cleanup_translations.py Outdated
Comment on lines +52 to +57
return set()


def find_unused_translations(keys, search_dirs):
# First, get keys that are used in code via Babel (covers Python/HTML)
babel_keys = get_keys_from_babel()
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On any Babel extraction error, get_keys_from_babel() returns an empty set, which makes unused = set(keys) - babel_keys - extra_keys treat almost everything as unused and can lead to mass deletion. Safer behavior is to fail fast (raise/exit non-zero) or return a sentinel (e.g., None) and have find_unused_translations abort without offering deletion when extraction fails.

Suggested change
return set()
def find_unused_translations(keys, search_dirs):
# First, get keys that are used in code via Babel (covers Python/HTML)
babel_keys = get_keys_from_babel()
return None
def find_unused_translations(keys, search_dirs):
# First, get keys that are used in code via Babel (covers Python/HTML)
babel_keys = get_keys_from_babel()
if babel_keys is None:
print("Aborting: unable to extract keys via Babel; not proceeding with cleanup.")
sys.exit(1)

Copilot uses AI. Check for mistakes.
Comment thread scripts/cleanup_translations.py Outdated
Comment on lines +27 to +36
subprocess.run(
["poetry", "run", "pybabel", "extract", "-F", "babel.cfg", "-k", "_l", "-o", "/tmp/cleanup_messages.po", "."],
check=True,
capture_output=True,
)

# 2. Run po2csv
subprocess.run(
["poetry", "run", "po2csv", "/tmp/cleanup_messages.po", "/tmp/cleanup_messages.csv"], check=True, capture_output=True
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot uses AI. Check for mistakes.
Comment thread scripts/cleanup_translations.py Outdated

# 3. Read extracted keys
extracted_keys = set()
with open("/tmp/cleanup_messages.csv", newline="") as csvfile:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot uses AI. Check for mistakes.
Comment thread scripts/cleanup_translations.py Outdated
Comment on lines +46 to +47
os.remove("/tmp/cleanup_messages.po")
os.remove("/tmp/cleanup_messages.csv")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +19
def get_translation_keys(csv_path):
keys = []
if not os.path.exists(csv_path):
return keys

with open(csv_path, mode="r", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
if row["source"]:
if row["source"].startswith("!/!/"):
continue
keys.append(row["source"])
return keys
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.

Copilot uses AI. Check for mistakes.

print(f"Loading translations from {csv_path}...")
keys = get_translation_keys(csv_path)
print(f"Found {len(keys)} unique translation source strings.")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.

Suggested change
print(f"Found {len(keys)} unique translation source strings.")
print(f"Found {len(keys)} translation source strings.")

Copilot uses AI. Check for mistakes.
Comment thread scripts/cleanup_translations.py Outdated
Comment on lines +88 to +107
# Second pass: search remaining unused keys in JS and CSS files
for directory in search_dirs:
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith((".js", ".css", ".scss", ".txt", ".md")):
file_path = os.path.join(root, file)
if "translations/csv" in file_path or "node_modules" in file_path:
continue
try:
with open(file_path, "r", encoding="utf-8") as f:
content = f.read()

to_remove = set()
for key in unused:
if key in content:
to_remove.add(key)

unused -= to_remove
if not unused:
return unused
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does an O(number_of_files × number_of_candidate_keys) substring scan, which can get expensive if unused is large. A more efficient approach is to aggregate searchable content once (or stream files) and check each key once, or use a multi-pattern search approach (e.g., building an index, or a compiled matcher) to avoid re-scanning the same content for every key.

Copilot uses AI. Check for mistakes.
Comment thread scripts/cleanup_translations.py Outdated
Comment on lines +59 to +82
# Also need to consider hardcoded constant keys used in test-translations.py
extra_keys = set(
[
"English Government of Canada signature",
"French Government of Canada signature",
"Empty",
"1 template",
"Number must have 10 digits",
"bad invitation link",
"invitation expired",
"password",
"Your service already uses ",
"Try again. Something’s wrong with this code",
"Code already sent, wait 10 seconds",
"You cannot delete a default email reply to address if other reply to addresses exist",
"Code has expired",
"Code already sent",
"Code has already been used",
"Code not found",
"as an email reply-to address.",
"You cannot remove the only user for a service",
"Cannot send to international mobile numbers",
]
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding extra_keys creates a drift risk against the real source of truth (the constants used by test-translations). Consider loading these from a shared module/config (or importing the list from the test-translations implementation if it’s in Python), so future translation key additions don’t require updating multiple places.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

if temp_file and os.path.exists(temp_file):
try:
os.remove(temp_file)
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.

Copilot Autofix

AI about 1 month ago

In general, to fix an empty except block, you should either narrow the caught exception type and handle it appropriately (log it, clean up, re-raise, etc.) or, if ignoring is truly intended, add a clear comment explaining why it is safe to do so. Here, the block is in a finally used to delete temporary files; a good fix is to log a warning when cleanup fails while still allowing the function to proceed without raising a new exception.

The best way to fix this without changing existing functionality is to replace the bare pass in the except Exception: inside the finally block of get_keys_from_babel with a print that reports the failure to remove the temp file. This preserves the fact that cleanup errors do not propagate (we still don’t re-raise) but avoids completely swallowing the error. We don’t need new imports since print is already available. Concretely, in scripts/cleanup_translations.py, lines 64–67 should be updated so that the except block prints a message including the filename and the exception instead of doing nothing.

Suggested changeset 1
scripts/cleanup_translations.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/cleanup_translations.py b/scripts/cleanup_translations.py
--- a/scripts/cleanup_translations.py
+++ b/scripts/cleanup_translations.py
@@ -63,8 +63,9 @@
             if temp_file and os.path.exists(temp_file):
                 try:
                     os.remove(temp_file)
-                except Exception:
-                    pass
+                except Exception as cleanup_error:
+                    # Best-effort cleanup: log the failure but do not mask original exceptions
+                    print(f"Warning: failed to remove temporary file '{temp_file}': {cleanup_error}")
 
 
 def find_unused_translations(keys, search_dirs):
EOF
@@ -63,8 +63,9 @@
if temp_file and os.path.exists(temp_file):
try:
os.remove(temp_file)
except Exception:
pass
except Exception as cleanup_error:
# Best-effort cleanup: log the failure but do not mask original exceptions
print(f"Warning: failed to remove temporary file '{temp_file}': {cleanup_error}")


def find_unused_translations(keys, search_dirs):
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants