Skip to content

fix: add UTF-8 BOM to CSV transcript exports#3360

Open
barry47products wants to merge 5 commits into
dimagi:mainfrom
barry47products:fix/csv-utf8-encoding
Open

fix: add UTF-8 BOM to CSV transcript exports#3360
barry47products wants to merge 5 commits into
dimagi:mainfrom
barry47products:fix/csv-utf8-encoding

Conversation

@barry47products
Copy link
Copy Markdown

Product Description

CSV transcript exports now open correctly in Excel on both Windows and macOS. Previously, special characters such as curly apostrophes (') and accented letters (é, ü, etc.) were garbled — for example I'm appeared as I’m.

Technical Description

Excel does not respect the Content-Type: text/csv charset header when opening files directly. Without a UTF-8 Byte Order Mark (BOM) at the start of the file, Excel defaults to the system locale encoding (Windows-1252 on most systems), causing multi-byte UTF-8 sequences to be misread.

The fix yields  (UTF-8 BOM) as the first chunk from export_rows_to_csv_stream in apps/experiments/export.py. This is a one-line change to the streaming CSV generator used by StreamingHttpResponse exports.

Migrations

N/A

Demo

Before: downloading a CSV export containing I'm or café and opening in Excel produces I’m / café.

After: characters render correctly.

Docs and Changelog

  • This PR requires docs/changelog update

Fixes #697

Excel misreads UTF-8 CSVs as Windows-1252 when no BOM is present, garbling special characters (e.g. "I'm" → "I’m"). Yield the BOM as the first chunk from export_rows_to_csv_stream so Excel on both Windows and macOS correctly identifies the encoding.

Fixes dimagi#697
codescene-delta-analysis[bot]

This comment was marked as outdated.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 23f7fac2-de00-4f88-8222-b506816573c6

📥 Commits

Reviewing files that changed from the base of the PR and between 1c1d837 and d681a7e.

📒 Files selected for processing (2)
  • apps/experiments/export.py
  • apps/experiments/tests/test_export.py

📝 Walkthrough

Walkthrough

This PR adds a UTF-8 BOM (Byte Order Mark) prefix to the CSV export stream in export_rows_to_csv_stream(). The implementation yields a leading "" string before streaming the CSV lines, ensuring Excel and other tools correctly interpret the character encoding. Two new test cases verify that the BOM is emitted as the first chunk and that special Unicode characters are preserved in the output.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding UTF-8 BOM to CSV exports to fix character encoding issues in Excel.
Description check ✅ Passed The description includes all key required sections with comprehensive technical details about the UTF-8 BOM fix, product impact, and issue reference.
Linked Issues check ✅ Passed The code change directly addresses issue #697 by adding UTF-8 BOM to the CSV export stream, ensuring Excel correctly interprets special characters.
Out of Scope Changes check ✅ Passed All changes are focused and in-scope: the one-line modification to export_rows_to_csv_stream and corresponding test additions to verify BOM and unicode character handling.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@snopoke snopoke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @barry47products.

Comment thread apps/experiments/export.py Outdated
Co-authored-by: Simon Kelly <skelly@dimagi.com>
codescene-delta-analysis[bot]

This comment was marked as outdated.

Comment thread apps/experiments/tests/test_export.py Outdated
@barry47products
Copy link
Copy Markdown
Author

@snopoke I wanted to check in with you on the CodeScene findings in https://github.com/dimagi/open-chat-studio/pull/3360/checks?check_run_id=76382291625

Is this something you typically address with a refactor or suppress?

@snopoke
Copy link
Copy Markdown
Contributor

snopoke commented May 19, 2026

@snopoke I wanted to check in with you on the CodeScene findings in https://github.com/dimagi/open-chat-studio/pull/3360/checks?check_run_id=76382291625

Is this something you typically address with a refactor or suppress?

We are experimenting with CodeScene - you can threat these as quality signal but not compulsory to address, particularly in test code.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@barry47products barry47products marked this pull request as ready for review May 19, 2026 13:57
Replace literal BOM characters with a named UTF8_BOM = ""
constant exported from apps/experiments/export.py, used in both the
source and the test. Addresses review feedback on dimagi#3360 to make the
BOM character self-documenting and avoid duplicating the literal
across modules.
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Failed
Enforce advisory code health rules (1 file with Code Duplication)

Gates Passed
3 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce advisory code health rules Violations Code Health Impact
test_export.py 1 advisory rule 10.00 → 9.94 Suppress

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.


from apps.chat.models import ChatMessage, ChatMessageType
from apps.experiments.export import filtered_export_to_csv
from apps.experiments.export import UTF8_BOM, export_rows_to_csv_stream, filtered_export_to_csv
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Code Duplication
The module contains 4 functions with similar structure: test_participant_data_export,test_participant_data_export_empty_data,test_participant_data_export_empty_diff,test_session_state_export

Suppress

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Character Encoding Issue in Exported CSV Transcripts

2 participants