fix: add UTF-8 BOM to CSV transcript exports#3360
Conversation
Excel misreads UTF-8 CSVs as Windows-1252 when no BOM is present, garbling special characters (e.g. "I'm" → "I’m"). Yield the BOM as the first chunk from export_rows_to_csv_stream so Excel on both Windows and macOS correctly identifies the encoding. Fixes dimagi#697
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds a UTF-8 BOM (Byte Order Mark) prefix to the CSV export stream in Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
snopoke
left a comment
There was a problem hiding this comment.
Thanks for the contribution @barry47products.
Co-authored-by: Simon Kelly <skelly@dimagi.com>
|
@snopoke I wanted to check in with you on the CodeScene findings in https://github.com/dimagi/open-chat-studio/pull/3360/checks?check_run_id=76382291625 Is this something you typically address with a refactor or suppress? |
We are experimenting with CodeScene - you can threat these as quality signal but not compulsory to address, particularly in test code. |
Replace literal BOM characters with a named UTF8_BOM = "" constant exported from apps/experiments/export.py, used in both the source and the test. Addresses review feedback on dimagi#3360 to make the BOM character self-documenting and avoid duplicating the literal across modules.
There was a problem hiding this comment.
Gates Failed
Enforce advisory code health rules
(1 file with Code Duplication)
Gates Passed
3 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| Enforce advisory code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| test_export.py | 1 advisory rule | 10.00 → 9.94 | Suppress |
Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
|
|
||
| from apps.chat.models import ChatMessage, ChatMessageType | ||
| from apps.experiments.export import filtered_export_to_csv | ||
| from apps.experiments.export import UTF8_BOM, export_rows_to_csv_stream, filtered_export_to_csv |
There was a problem hiding this comment.
❌ New issue: Code Duplication
The module contains 4 functions with similar structure: test_participant_data_export,test_participant_data_export_empty_data,test_participant_data_export_empty_diff,test_session_state_export
Product Description
CSV transcript exports now open correctly in Excel on both Windows and macOS. Previously, special characters such as curly apostrophes (
') and accented letters (é,ü, etc.) were garbled — for exampleI'mappeared asI’m.Technical Description
Excel does not respect the
Content-Type: text/csvcharset header when opening files directly. Without a UTF-8 Byte Order Mark (BOM) at the start of the file, Excel defaults to the system locale encoding (Windows-1252 on most systems), causing multi-byte UTF-8 sequences to be misread.The fix yields
(UTF-8 BOM) as the first chunk fromexport_rows_to_csv_streaminapps/experiments/export.py. This is a one-line change to the streaming CSV generator used byStreamingHttpResponseexports.Migrations
N/A
Demo
Before: downloading a CSV export containing
I'morcaféand opening in Excel producesI’m/café.After: characters render correctly.
Docs and Changelog
Fixes #697