Skip to content

Conversation

@remilejeune2
Copy link
Contributor

@remilejeune2 remilejeune2 commented Dec 22, 2025

Summary by CodeRabbit

  • Documentation
    • Added comprehensive Data Retention guidance: data categories, default retention ranges (0–12 months) and 0→24 hour deletion, zero-data mode behavior, and that custom/zero retention options are limited to Enterprise plans.
    • Clarified usage tracking retains essential API metadata and immutable logs for a limited period; zero-data mode is ephemeral with no data at rest and delivered results become inaccessible.
  • Navigation
    • Added a new Data Retention page to Limits & Specifications.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 22, 2025

📝 Walkthrough

Walkthrough

Adds a new documentation page explaining two data retention modes (Standard and Zero Data Retention), data categories (audio input, transcription output, API metadata, logs), retention durations (0→delete within 24 hours; up to 12 months), and Enterprise-only options for custom/zero retention; clarifies Zero Data Retention behaviors and delivery constraints.

Changes

Cohort / File(s) Summary
Data retention docs & nav
chapters/limits-and-specifications/data-retention.mdx, docs.json
New documentation page added and registered in navigation. Defines data categories; outlines Standard retention (0 = deletion within 24 hours; up to 12 months max), Zero Data Retention (no audio at rest, no retrievable transcripts/metadata, results delivered only via callbacks, async requires audio_url), and plan availability (custom/zero retention = Enterprise-only). Notes limited immutable logs and retained API metadata for usage/tracking.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

"I hopped through docs with ears so bright,
New retention rules tucked out of sight.
No files at rest, just passing breeze,
Enterprise doors guard privacy with ease.
I nibble notes and then depart — secure and light at heart." 🐇

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly matches the main changeset: adding a new data retention documentation page to the Limits & Specifications section.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
chapters/limits-and-specifications/data-retention.mdx (2)

8-8: Minor wording: Consider "replay workflows" (plural).

The phrase "replay workflow" reads slightly awkwardly. Consider changing to "replay workflows" for better flow.

🔎 Proposed fix
-By default, Gladia stores transcription data to allow users to retrieve results, original audio, debug issues, and replay workflow.
+By default, Gladia stores transcription data to allow users to retrieve results, original audio, debug issues, and replay workflows.

29-29: Consider consistent heading hierarchy.

These subsections use h3 (###) headings, but given the document structure, h2 (##) might be more appropriate for better consistency, especially since line 42 uses h2 for a similar-level section.

🔎 Proposed fix
-### Zero Data Retention behavior
+## Zero Data Retention behavior
 
 When Zero Data Retention is enabled, Gladia processes data ephemerally.
 
-### Key guarantees
+## Key guarantees

Also applies to: 33-33

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9fcd25 and 655e251.

📒 Files selected for processing (1)
  • chapters/limits-and-specifications/data-retention.mdx
🧰 Additional context used
🪛 GitHub Check: Mintlify Validation (gladia-95) - vale-spellcheck
chapters/limits-and-specifications/data-retention.mdx

[warning] 8-8: chapters/limits-and-specifications/data-retention.mdx#L8
Did you really mean 'Gladia'?


[warning] 18-18: chapters/limits-and-specifications/data-retention.mdx#L18
Did you really mean 'Gladia'?


[warning] 31-31: chapters/limits-and-specifications/data-retention.mdx#L31
Did you really mean 'Gladia'?

🔇 Additional comments (1)
chapters/limits-and-specifications/data-retention.mdx (1)

42-48: LGTM! Clear documentation of API behavior changes.

This section clearly explains how Zero Data Retention affects API behavior, including disabled file uploads and webhook-only result delivery. The restrictions are well-documented.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
chapters/limits-and-specifications/data-retention.mdx (4)

10-10: Fix grammar issues in this sentence.

This line has two grammar issues:

  1. Should be "eligible for" instead of "eligible to" (as previously noted)
  2. Missing the article "the" before "zero data retention option"
🔎 Proposed fix
-Enterprise users are eligible for custom data retention and zero data retention option.
+Enterprise users are eligible for custom data retention and the zero data retention option.

14-14: Clarify "Custom (up to 24 hours)" and the apparent contradiction.

As previously noted, this phrase is ambiguous. Additionally, it's unclear why Enterprise users would have a maximum of 24 hours when Paid and Free plans offer up to 12 months. This seems counterintuitive unless "Custom" means something different from the standard retention period.

Please clarify:

  • Does "Custom (up to 24 hours)" refer to a different retention configuration?
  • Why would Enterprise have seemingly less retention than Paid/Free plans?
  • Or does "Custom" mean Enterprise can configure any value, with 24 hours being one option among many?

24-24: Clarify metadata retention behavior after the retention period.

As previously noted, the metadata retention description is ambiguous. It's unclear:

  • What "Minimal" metadata means (which specific fields?)
  • How long the minimal metadata is retained
  • Whether this applies to all plans or only certain ones

This is important for compliance (GDPR, CCPA) and privacy documentation.


25-25: Specify the retention period for logs instead of "Immutable".

As previously noted, "Immutable" doesn't clearly communicate the retention policy. In the context of data retention documentation, users need to know:

  • How long logs are retained
  • Whether they are eventually deleted or kept indefinitely
  • How this aligns with compliance requirements
🧹 Nitpick comments (2)
chapters/limits-and-specifications/data-retention.mdx (2)

35-38: Add periods to list items for consistency.

The list items should end with periods for consistency with standard documentation formatting, especially since these are complete declarative statements.

🔎 Proposed fix
-- No audio files are stored
-- No transcripts are stored
-- No results are persisted
-- No replay or retrieval is possible
+- No audio files are stored.
+- No transcripts are stored.
+- No results are persisted.
+- No replay or retrieval is possible.

46-48: Add periods to list items and clarify file upload restriction.

For consistency with documentation standards:

  1. Add periods to complete the list items
  2. Consider clarifying what "file uploads" means (e.g., direct file uploads via multipart/form-data, upload endpoints, etc.)
🔎 Proposed fix
-- File uploads are disabled. Only `audio_url` inputs are supported on the asynchronous API.
-- Results are delivered **only via callbacks or webhooks**
-- Polling or later retrieval is not supported
+- Direct file uploads are disabled. Only `audio_url` inputs are supported for the asynchronous API.
+- Results are delivered **only via callbacks or webhooks**.
+- Polling or later retrieval is not supported.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 655e251 and fcbd8ab.

📒 Files selected for processing (2)
  • chapters/limits-and-specifications/data-retention.mdx
  • docs.json
🧰 Additional context used
🪛 GitHub Check: Mintlify Validation (gladia-95) - vale-spellcheck
chapters/limits-and-specifications/data-retention.mdx

[warning] 8-8: chapters/limits-and-specifications/data-retention.mdx#L8
Did you really mean 'Gladia'?


[warning] 18-18: chapters/limits-and-specifications/data-retention.mdx#L18
Did you really mean 'Gladia'?


[warning] 31-31: chapters/limits-and-specifications/data-retention.mdx#L31
Did you really mean 'Gladia'?

🔇 Additional comments (1)
docs.json (1)

107-108: LGTM!

The new data retention page is correctly added to the navigation structure with proper path formatting and logical placement within the "Limits & Specifications" group.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
chapters/limits-and-specifications/data-retention.mdx (1)

27-29: Consider streamlining repetitive phrasing.

Lines 27-29 each end with the identical phrase "cannot be retrieved through the API or in the Gladia's playground" which makes the text repetitive.

🔎 Suggested streamlining
+Under Zero Data Retention, the following data cannot be retrieved through the API or viewed in [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up):
+
-- **No audio files are stored**: Files cannot be retrieved through the API or in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up). File upload is also disabled; the asynchronous API must use `audio_url` inputs.
-- **No transcripts are stored**: Transcription results cannot be retrieved through the API and are not visible in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up).
-- **No metadata is persisted**: Transcription API calls, audio duration, and other metadata cannot be retrieved through the API or in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up).
+- **No audio files are stored**: File upload is disabled; the asynchronous API must use `audio_url` inputs.
+- **No transcripts are stored**: Transcription results are delivered once and not persisted.
+- **No metadata is persisted**: Transcription API calls, audio duration, and other metadata are not stored.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fcbd8ab and dc9db29.

📒 Files selected for processing (1)
  • chapters/limits-and-specifications/data-retention.mdx
🧰 Additional context used
🪛 GitHub Check: Mintlify Validation (gladia-95) - vale-spellcheck
chapters/limits-and-specifications/data-retention.mdx

[warning] 6-6: chapters/limits-and-specifications/data-retention.mdx#L6
Did you really mean 'Gladia'?


[warning] 21-21: chapters/limits-and-specifications/data-retention.mdx#L21
Did you really mean 'Gladia'?


[warning] 25-25: chapters/limits-and-specifications/data-retention.mdx#L25
Did you really mean 'Gladia'?

🔇 Additional comments (2)
chapters/limits-and-specifications/data-retention.mdx (2)

1-10: LGTM! Clear structure and comprehensive data types.

The front matter and data types list are well-structured and clearly define what data is processed.


17-19: LGTM! Clear eligibility statement.

The Info box clearly communicates the Enterprise-only availability of custom and zero retention options.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
chapters/limits-and-specifications/data-retention.mdx (3)

21-21: Add Oxford comma for clarity.

For consistency with style guides and to avoid ambiguity, consider adding an Oxford comma in the metadata list.

🔎 Proposed fix
-To enable usage tracking, Gladia retains essential API metadata: request ID, timestamp, processing status and audio duration. Immutable logs are also maintained, for a limited period, to ensure service quality and reliability.
+To enable usage tracking, Gladia retains essential API metadata: request ID, timestamp, processing status, and audio duration. Immutable logs are also maintained, for a limited period, to ensure service quality and reliability.

21-21: Clarify the relationship between internal metadata retention and user accessibility under ZDR.

Line 21 states that "essential API metadata" is retained for usage tracking, but lines 25-29 describe Zero Data Retention as having "no data stored at rest" with "no metadata retrieval." This creates ambiguity about whether metadata is truly not stored, or simply not accessible to users.

For transparency and compliance purposes, consider explicitly distinguishing between:

  • Internal retention (metadata retained by Gladia for billing, usage monitoring, and system reliability—not accessible to users)
  • User-accessible data (zero storage, immediate deletion, no retrieval possible)

A brief clarifying note in the ZDR section would help users understand that while they cannot retrieve any data, Gladia retains minimal metadata internally for operational purposes as described in line 21.

Based on past review comments, this distinction appears to be understood internally but could be clearer in the documentation.

🔎 Example clarification

Add a note at the end of the Zero Data Retention section (after line 29):

**Note:** While no user data is accessible under Zero Data Retention, Gladia retains minimal API metadata (request ID, timestamp, processing status, and audio duration) internally for usage tracking and billing purposes, as described above. This metadata is not accessible through the API or playground.

Also applies to: 25-29


21-21: Consider specifying the retention period for immutable logs.

The phrase "for a limited period" is vague. If the retention period for operational logs is defined (e.g., 30 days, 90 days, 1 year), specifying it would improve transparency and help with compliance requirements.

If the specific period is not yet determined or varies, consider adding "contact support for details" or similar guidance.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc9db29 and 8404a24.

📒 Files selected for processing (1)
  • chapters/limits-and-specifications/data-retention.mdx
🧰 Additional context used
🪛 GitHub Check: Mintlify Validation (gladia-95) - vale-spellcheck
chapters/limits-and-specifications/data-retention.mdx

[warning] 6-6: chapters/limits-and-specifications/data-retention.mdx#L6
Did you really mean 'Gladia'?


[warning] 21-21: chapters/limits-and-specifications/data-retention.mdx#L21
Did you really mean 'Gladia'?


[warning] 25-25: chapters/limits-and-specifications/data-retention.mdx#L25
Did you really mean 'Gladia'?

🔇 Additional comments (3)
chapters/limits-and-specifications/data-retention.mdx (3)

6-10: Clear data categorization.

The four data types are well-defined and provide a solid foundation for understanding the retention policies that follow.


12-15: Retention modes are clearly distinguished.

The distinction between Standard (0-12 months with periodic deletion) and Zero Data Retention (ephemeral processing, immediate deletion) is well articulated. Line 15 effectively addresses the need to explain that ZDR minimizes storage "at all stages" and avoids temporary storage.


17-19: Enterprise restriction clearly communicated.

The Info callout appropriately highlights that custom retention and zero data retention are Enterprise-only features.


When Zero Data Retention is enabled, Gladia processes data ephemerally; no data is stored at rest.

- **No audio files are stored**: Files cannot be retrieved through the API or in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up). File upload is also disabled; the asynchronous API must use `audio_url` inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the asynchronous API must use audio_url inputs

This is ambigous. Maybe specify that it must be an external public audio file url ?

- **No audio files are stored**: Files cannot be retrieved through the API or in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up). File upload is also disabled; the asynchronous API must use `audio_url` inputs.
- **No transcripts are stored**: Transcription results cannot be retrieved through the API and are not visible in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up).
- **No metadata retrieval**: Transcription API calls, audio duration, and other metadata cannot be retrieved through the API or in the [Gladia's playground](https://app.gladia.io/?utm_campaign=197609689-Website&utm_source=documentation&utm_content=sign-up).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe add a bullet point to specify that the only way to get your transcription result is by usign webhook / callbacks

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callbacks only to get the transcription back.
with the webhook, you will only get the notification that the transcription has ended, no data.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
chapters/limits-and-specifications/data-retention.mdx (1)

27-27: Consider clarifying URL accessibility requirements.

The phrase "external audio file url" is technically correct, but you might clarify whether the URL needs to be publicly accessible or if authenticated URLs (like S3 presigned URLs) are sufficient. The example "S3 presigned url" suggests authenticated URLs are acceptable, which is helpful.

🔎 Optional clarification
-File upload is also disabled; the asynchronous API must use an external audio file url, such as S3 presigned url.
+File upload is also disabled; the asynchronous API must use an externally accessible audio file URL (such as a public URL or S3 presigned URL).
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8404a24 and 072ec8f.

📒 Files selected for processing (1)
  • chapters/limits-and-specifications/data-retention.mdx
🧰 Additional context used
🪛 GitHub Check: Mintlify Validation (gladia-95) - vale-spellcheck
chapters/limits-and-specifications/data-retention.mdx

[warning] 6-6: chapters/limits-and-specifications/data-retention.mdx#L6
Did you really mean 'Gladia'?


[warning] 21-21: chapters/limits-and-specifications/data-retention.mdx#L21
Did you really mean 'Gladia'?


[warning] 25-25: chapters/limits-and-specifications/data-retention.mdx#L25
Did you really mean 'Gladia'?


[warning] 27-27: chapters/limits-and-specifications/data-retention.mdx#L27
Did you really mean 'presigned'?

🔇 Additional comments (6)
chapters/limits-and-specifications/data-retention.mdx (6)

1-4: LGTM!

The frontmatter is properly structured with clear title and description.


6-10: LGTM!

The data type categorization is clear and well-structured.


12-15: Clear distinction between Standard (0 days) and Zero Data Retention.

The updated wording effectively clarifies that Standard retention with 0 days results in deletion within 24 hours, while Zero Data Retention minimizes storage at all stages and deletes immediately. This addresses previous concerns about ambiguity.


17-19: LGTM!

The Enterprise eligibility notice is clear and uses correct grammar.


21-21: Clarify whether minimal metadata is retained under Zero Data Retention.

Line 21 states that essential API metadata (request ID, timestamp, processing status, audio duration) is retained for usage tracking. However, the Zero Data Retention section doesn't explicitly state whether this minimal metadata is still stored under ZDR mode, or if "No metadata retrieval" (line 29) means it's also not persisted.

For compliance and user trust, explicitly state in the ZDR section whether any metadata is retained for usage tracking/billing, even when ZDR is enabled.

🔎 Suggested clarification

Add a note in the ZDR section (after line 29 or 31):

**Note:** Even with Zero Data Retention enabled, minimal metadata (request ID, timestamp, processing status, audio duration) is retained for usage tracking and billing purposes as described above.

Or if no metadata is retained under ZDR, clarify that in line 21:

-To enable usage tracking, Gladia retains essential API metadata: request ID, timestamp, processing status and audio duration. Immutable logs are also maintained, for a limited period, to ensure service quality and reliability.
+To enable usage tracking, Gladia retains essential API metadata: request ID, timestamp, processing status and audio duration (except when Zero Data Retention is enabled). Immutable logs are also maintained, for a limited period, to ensure service quality and reliability.

Based on past review comments addressing similar concerns.


30-32: LGTM!

The callback delivery requirement and post-delivery inaccessibility are clearly stated, addressing previous comments about delivery mechanisms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants