Skip to content

Conversation

@lukaszgryglicki
Copy link
Member

This PR addresses slack feedback regarding creation/modification dates backfill.

cc @mlehotskylf @dealako @ahmedomosanya

Signed-off-by: Lukasz Gryglicki [email protected]

Assisted by OpenAI

Assisted by GitHub Copilot

Signed-off-by: Lukasz Gryglicki <[email protected]>

Assisted by [OpenAI](https://platform.openai.com/)

Assisted by [GitHub Copilot](https://github.com/features/copilot)
@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

Walkthrough

This change introduces approximate date fields (ApproxDateCreated, ApproxDateModified) to signature records and updates the backfill tool to conditionally set either exact or approximate date fields based on data source. Shell scripts are updated to remove the ALLOW_CURRENT_TIME flag.

Changes

Cohort / File(s) Change Summary
Approximate Date Fields Addition
cla-backend-go/cmd/signatures_timestamp_backfill/main.go
Added ApproxDateCreated and ApproxDateModified fields to SignatureRecord struct with DynamoDB mappings. Expanded UpdateStats with ApproxCreated and ApproxModified counters. Added corresponding constants (attrApproxDateCreated, attrApproxDateModified, exprSetApproxDateCreated, exprSetApproxDateModified). Updated update logic to conditionally set exact or approximate date fields based on source, and modified stats tracking and output printing.
Environment Configuration Updates
cla-backend-go/signatures_timestamp_backfill_dev.sh, cla-backend-go/signatures_timestamp_backfill_prod.sh
Removed ALLOW_CURRENT_TIME flag from environment variable setup in both development and production shell scripts. Updated commented examples to reflect the removal.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Conditional logic for setting exact vs. approximate date fields requires verification across Snowflake and DDB update paths
  • New struct fields and constants are straightforward but need confirmation they integrate correctly with existing update expressions and attribute mappings
  • Shell script changes are simple removals but should be verified against deployment processes

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title partially relates to the changeset, referencing date backfill feedback, but contains a typo ('feeback' instead of 'feedback') and is somewhat vague about the specific changes made.
Description check ✅ Passed The description is related to the changeset, referencing slack feedback about creation/modification dates backfill and includes proper attribution.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch unicron-if-fivetran-synced-is-used-update-approx-date-fields-in-backfill-script

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses Slack feedback regarding timestamp backfill by removing the ability to use current time for missing dates and introducing separate "approximate" fields for Fivetran-sourced timestamps.

Key Changes

  • Removed ALLOW_CURRENT_TIME environment variable and the finalNowFix() function that populated missing dates with current time
  • Introduced approx_date_created and approx_date_modified fields to store timestamps sourced from Fivetran sync data (which are approximate)
  • Updated statistics tracking and reporting to separately count updates to regular vs. approximate date fields

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
cla-backend-go/signatures_timestamp_backfill_prod.sh Removed ALLOW_CURRENT_TIME environment variable from production script
cla-backend-go/signatures_timestamp_backfill_dev.sh Removed ALLOW_CURRENT_TIME environment variable from development script
cla-backend-go/cmd/signatures_timestamp_backfill/main.go Removed finalNowFix() function and labelNow constant; added approx date fields to data model, update logic, and statistics; enhanced CLI fallback file with copyright header

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cla-backend-go/cmd/signatures_timestamp_backfill/main.go (2)

52-52: Consider aligning the condition expression with the scan’s NULL handling

The scan filter treats AttributeType(..., "NULL") as “missing”, but condAnyMissing only checks attribute_not_exists or empty string. Rows with date_created/date_modified stored as DynamoDB NULL will match the scan but fail the condition, ending up in CLI fallbacks instead of being auto-fixed.

You may want to extend condAnyMissing to mirror the scan predicate (including the AttributeType(..., "NULL") checks) so the condition gate and the scan are consistent.

Also applies to: 270-280


605-612: Verify intended behavior for approx vs real fields in the Snowflake/Fivetran path

The Snowflake pass now:

  • Always exposes both real and approx names in names, and
  • Writes to approx_* only when srcC/srcM == labelFivetranSynced, otherwise to the real date_* fields, with stats routed to Approx* vs Created/Modified accordingly.

One subtle case to double‑check: when both created and modified are missing and _FIVETRAN_SYNCED is the only source, newC comes from Fivetran, and newM is then derived “from_created”, so srcC == labelFivetranSynced but srcM == labelFromCreated. This means you’ll set approx_date_created but the real date_modified, and ApproxModified stats won’t be incremented. If the product expectation is that both created and modified are considered approximate whenever they ultimately come from Fivetran, you may want to treat this “from_created but created itself is Fivetran-based” case as approximate too.

Also applies to: 618-625, 632-660

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 82f9935 and e26d703.

📒 Files selected for processing (3)
  • cla-backend-go/cmd/signatures_timestamp_backfill/main.go (8 hunks)
  • cla-backend-go/signatures_timestamp_backfill_dev.sh (1 hunks)
  • cla-backend-go/signatures_timestamp_backfill_prod.sh (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CodeQL analysis (go)
  • GitHub Check: Agent
  • GitHub Check: cypress-functional
  • GitHub Check: build-test-lint
🔇 Additional comments (5)
cla-backend-go/signatures_timestamp_backfill_dev.sh (1)

5-5: Dev backfill invocation looks consistent with main.go expectations

The env wiring (DEBUG=true, STAGE=dev, DRY_RUN=true) and removal of ALLOW_CURRENT_TIME align cleanly with main()’s env handling; no functional issues from this change.

cla-backend-go/signatures_timestamp_backfill_prod.sh (1)

5-6: Prod script env flags are coherent and match Go helper semantics

Using DEBUG=''/DEBUG=true and DRY_RUN=true/DRY_RUN='' lines up with getEnvBool (only truthy strings enable flags), and removing ALLOW_CURRENT_TIME is consistent with the updated Go logic.

cla-backend-go/cmd/signatures_timestamp_backfill/main.go (3)

37-40: Approximate date fields and stats wiring are internally consistent

The introduction of approx_date_created/approx_date_modified (constants + struct fields), the parallel ApproxCreated/ApproxModified counters, and the extended printStats output form a coherent model: updates in Snowflake paths can cleanly distinguish exact vs approximate sources, and stats are initialized correctly via newStats() so no nil-map panics in current usage.

Also applies to: 46-52, 93-99, 111-124, 1125-1126


155-155: Nice: clearer startup logging and self‑contained fallback script header

Including DEBUG in the startup banner and emitting a full shebang + set -euo pipefail + license header into the fallback CLI file makes it much easier and safer to run the generated script directly in incident/debug scenarios.

Also applies to: 182-192


1080-1085: CLI builder correctly propagates approx_ values and default names*

Extending buildAwsCliUpdate to include :approx_date_created/:approx_date_modified in valsFlat and updating the fallback names JSON keeps the generated AWS CLI commands in sync with the new attributes; even if JSON marshaling fails, the hardcoded names now cover both exact and approximate fields.

Also applies to: 1093-1094

@lukaszgryglicki lukaszgryglicki merged commit 2ef46a5 into dev Nov 21, 2025
10 of 12 checks passed
@lukaszgryglicki lukaszgryglicki deleted the unicron-if-fivetran-synced-is-used-update-approx-date-fields-in-backfill-script branch November 21, 2025 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants