Idea: Cleaning up unused translations by amazingphilippe · Pull Request #2635 · cds-snc/notification-admin

amazingphilippe · 2026-03-30T15:24:28Z

Summary | Résumé

We have a lot of translated content strings that we don't use anymore. Mostly from the days where content was hardcoded. Do we want to clean this up? This script can do it automatically, and we can always run make test-translations as a safety measure.

Test instructions | Instructions pour tester la modification

run make cleanup-translations
It might find unused translations, and prompt you to confirm deletion
run make test-translations
It should not find missing translations

+                            unused -= to_remove
+                            if not unused:
+                                return unused
+                    except Exception:


In general, empty except blocks should either (a) be removed so exceptions propagate, or (b) be replaced with handling that logs or compensates for the error, and possibly narrows the exception type. Here, we want to preserve existing behavior as much as possible: continue scanning other files even if one file fails, but avoid completely silent failures.

The best fix is to keep the try/except but:

Narrow the exception to expected I/O/decoding errors (OSError, UnicodeDecodeError) instead of catching Exception.

Add minimal logging to stderr indicating that a particular file could not be read and why.
This maintains the current control flow (the function continues scanning other files and ultimately returns unused) while making problems visible to the user. We can implement this entirely inside find_unused_translations without changing any imports or external behavior. Concretely, in scripts/cleanup_translations.py, lines 89–102 should be updated so that the except block prints an informative message, and the caught exception type is narrowed.

Copilot

Pull request overview

Adds an interactive utility to identify and remove unused French translation strings by combining Babel extraction results with a fallback text search, and wires it into the Makefile.

Changes:

Introduces scripts/cleanup_translations.py to extract used strings via Babel and detect unused keys.
Adds an interactive deletion flow to remove unused rows from app/translations/csv/fr.csv.
Adds make cleanup-translations target to run the script.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 8 comments.

File	Description
scripts/cleanup_translations.py	New script to detect and optionally delete unused translation strings using Babel + source search.
Makefile	Adds a new `cleanup-translations` target to run the cleanup script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T15:27:46Z

+        return set()
+
+
+def find_unused_translations(keys, search_dirs):
+    # First, get keys that are used in code via Babel (covers Python/HTML)
+    babel_keys = get_keys_from_babel()


On any Babel extraction error, get_keys_from_babel() returns an empty set, which makes unused = set(keys) - babel_keys - extra_keys treat almost everything as unused and can lead to mass deletion. Safer behavior is to fail fast (raise/exit non-zero) or return a sentinel (e.g., None) and have find_unused_translations abort without offering deletion when extraction fails.

Suggested change

return set()

def find_unused_translations(keys, search_dirs):

# First, get keys that are used in code via Babel (covers Python/HTML)

babel_keys = get_keys_from_babel()

return None

def find_unused_translations(keys, search_dirs):

# First, get keys that are used in code via Babel (covers Python/HTML)

babel_keys = get_keys_from_babel()

if babel_keys is None:

print("Aborting: unable to extract keys via Babel; not proceeding with cleanup.")

sys.exit(1)

Copilot · 2026-03-30T15:27:47Z

+        subprocess.run(
+            ["poetry", "run", "pybabel", "extract", "-F", "babel.cfg", "-k", "_l", "-o", "/tmp/cleanup_messages.po", "."],
+            check=True,
+            capture_output=True,
+        )
+
+        # 2. Run po2csv
+        subprocess.run(
+            ["poetry", "run", "po2csv", "/tmp/cleanup_messages.po", "/tmp/cleanup_messages.csv"], check=True, capture_output=True
+        )


The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot · 2026-03-30T15:27:47Z

+
+        # 3. Read extracted keys
+        extracted_keys = set()
+        with open("/tmp/cleanup_messages.csv", newline="") as csvfile:


The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot · 2026-03-30T15:27:47Z

+        os.remove("/tmp/cleanup_messages.po")
+        os.remove("/tmp/cleanup_messages.csv")


The script uses fixed /tmp/cleanup_messages.* paths and only cleans them up on the success path. This can clash across concurrent runs (e.g., parallel CI/jobs) and can leak temp files when any subprocess step fails. Use tempfile to create unique temp paths and ensure cleanup in a finally block (also consider specifying encoding='utf-8' when reading the temp CSV for consistent behavior).

Copilot · 2026-03-30T15:27:48Z

+def get_translation_keys(csv_path):
+    keys = []
+    if not os.path.exists(csv_path):
+        return keys
+
+    with open(csv_path, mode="r", encoding="utf-8") as f:
+        reader = csv.DictReader(f)
+        for row in reader:
+            if row["source"]:
+                if row["source"].startswith("!/!/"):
+                    continue
+                keys.append(row["source"])
+    return keys


The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.

Copilot · 2026-03-30T15:27:48Z

+
+    print(f"Loading translations from {csv_path}...")
+    keys = get_translation_keys(csv_path)
+    print(f"Found {len(keys)} unique translation source strings.")


The log message claims the script found 'unique' source strings, but get_translation_keys returns a list and can include duplicates. Either deduplicate (e.g., return a set or de-dupe before printing) or adjust the message to not claim uniqueness.

Suggested change

print(f"Found {len(keys)} unique translation source strings.")

print(f"Found {len(keys)} translation source strings.")

Copilot · 2026-03-30T15:27:48Z

+    # Second pass: search remaining unused keys in JS and CSS files
+    for directory in search_dirs:
+        for root, dirs, files in os.walk(directory):
+            for file in files:
+                if file.endswith((".js", ".css", ".scss", ".txt", ".md")):
+                    file_path = os.path.join(root, file)
+                    if "translations/csv" in file_path or "node_modules" in file_path:
+                        continue
+                    try:
+                        with open(file_path, "r", encoding="utf-8") as f:
+                            content = f.read()
+
+                            to_remove = set()
+                            for key in unused:
+                                if key in content:
+                                    to_remove.add(key)
+
+                            unused -= to_remove
+                            if not unused:
+                                return unused


This does an O(number_of_files × number_of_candidate_keys) substring scan, which can get expensive if unused is large. A more efficient approach is to aggregate searchable content once (or stream files) and check each key once, or use a multi-pattern search approach (e.g., building an index, or a compiled matcher) to avoid re-scanning the same content for every key.

Copilot · 2026-03-30T15:27:48Z

+    # Also need to consider hardcoded constant keys used in test-translations.py
+    extra_keys = set(
+        [
+            "English Government of Canada signature",
+            "French Government of Canada signature",
+            "Empty",
+            "1 template",
+            "Number must have 10 digits",
+            "bad invitation link",
+            "invitation expired",
+            "password",
+            "Your service already uses ",
+            "Try again. Something’s wrong with this code",
+            "Code already sent, wait 10 seconds",
+            "You cannot delete a default email reply to address if other reply to addresses exist",
+            "Code has expired",
+            "Code already sent",
+            "Code has already been used",
+            "Code not found",
+            "as an email reply-to address.",
+            "You cannot remove the only user for a service",
+            "Cannot send to international mobile numbers",
+        ]
+    )


Hardcoding extra_keys creates a drift risk against the real source of truth (the constants used by test-translations). Consider loading these from a shared module/config (or importing the list from the test-translations implementation if it’s in Python), so future translation key additions don’t require updating multiple places.

github-actions · 2026-03-30T15:30:27Z

🧪 Review environment

https://bw5fulsda4vf5476xpfzrmclwy0yhdcl.lambda-url.ca-central-1.on.aws/

+            if temp_file and os.path.exists(temp_file):
+                try:
+                    os.remove(temp_file)
+                except Exception:


In general, to fix an empty except block, you should either narrow the caught exception type and handle it appropriately (log it, clean up, re-raise, etc.) or, if ignoring is truly intended, add a clear comment explaining why it is safe to do so. Here, the block is in a finally used to delete temporary files; a good fix is to log a warning when cleanup fails while still allowing the function to proceed without raising a new exception.

The best way to fix this without changing existing functionality is to replace the bare pass in the except Exception: inside the finally block of get_keys_from_babel with a print that reports the failure to remove the temp file. This preserves the fact that cleanup errors do not propagate (we still don’t re-raise) but avoids completely swallowing the error. We don’t need new imports since print is already available. Concretely, in scripts/cleanup_translations.py, lines 64–67 should be updated so that the except block prints a message including the filename and the exception instead of doing nothing.

Cleaning up unused translations

6bc2864

amazingphilippe requested a review from Copilot March 30, 2026 15:24

Copilot started reviewing on behalf of amazingphilippe March 30, 2026 15:25 View session

github-advanced-security AI found potential problems Mar 30, 2026

View reviewed changes

Copilot AI reviewed Mar 30, 2026

View reviewed changes

amazingphilippe added 2 commits April 1, 2026 13:07

address copilot review

0c6d9a6

format

90595e4

github-advanced-security AI found potential problems Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: Cleaning up unused translations#2635

Idea: Cleaning up unused translations#2635
amazingphilippe wants to merge 3 commits intomainfrom
debt/unused-translation-script

amazingphilippe commented Mar 30, 2026

Uh oh!

Check notice

Copilot Autofix

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Check notice

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		os.remove("/tmp/cleanup_messages.po")
		os.remove("/tmp/cleanup_messages.csv")

	print(f"Found {len(keys)} unique translation source strings.")
	print(f"Found {len(keys)} translation source strings.")

Conversation

amazingphilippe commented Mar 30, 2026

Summary | Résumé

Test instructions | Instructions pour tester la modification

Uh oh!

Check notice

Copilot Autofix

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 30, 2026

🧪 Review environment

Uh oh!

Check notice

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants