Skip to content

feat: add audit for unused content fragments#1574

Closed
holtvogt wants to merge 38 commits intomainfrom
feat/add-unused-content-fragments-audit
Closed

feat: add audit for unused content fragments#1574
holtvogt wants to merge 38 commits intomainfrom
feat/add-unused-content-fragments-audit

Conversation

@holtvogt
Copy link
Copy Markdown

@holtvogt holtvogt commented Nov 14, 2025

This PR introduces a new audit to monitor and identify unused content fragments on AEM to optimize content governance and reduce system overhead. See SITES-36578.

What's New

Content Fragment Unused Audit (content-fragment-unused): Analyzes content fragments in AEM to identify unused content that has remained in draft, modified, or unpublished states for extended periods (90+ days). The audit categorizes fragments by their lifecycle status and provides detailed statistics including age distribution, counts, and percentages to help teams prioritize cleanup efforts.

The audit identifies four categories of unused content:

  • NEW: Fragments created but never published
  • DRAFT: Fragments created and modified but never published
  • UNPUBLISHED: Fragments that were published then unpublished without further modifications

Use Case

Content governance optimization for AEM Sites: Automatically detect stale and unused content fragments across AEM Sites by analyzing content lifecycle status and age. The audit provides actionable insights to help teams identify orphaned drafts, abandoned work-in-progress, and outdated content, enabling focus on active content management. Each finding includes detailed metadata such as fragment age, last modification date, and publication history to support informed cleanup decisions.

Related

@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new audit for AEM Sites that identifies unused content fragments to optimize content governance. The audit analyzes fragments in draft, new, unpublished, or modified states that have been inactive for 90+ days, providing detailed statistics and actionable insights for cleanup efforts.

Key Changes:

  • Implements content-fragment-unused audit with lifecycle status analysis (NEW, DRAFT, UNPUBLISHED, MODIFIED)
  • Adds AEM client infrastructure for content fragment API integration with IMS authentication
  • Implements S3 storage for detailed fragment data with organized date-based path structure

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/index.js Registers the new content-fragment-unused audit handler in the main handler registry
src/content-fragment-unused/handler.js Main audit orchestration with runner and post-processor for creating opportunities and suggestions
src/content-fragment-unused/storage/s3-storage.js S3 utilities for uploading/downloading fragment data with date-partitioned paths
src/content-fragment-unused/opportunity-data-mapper.js Defines opportunity metadata including title, description, and tags
src/content-fragment-insights/aem-analyzer.js High-level analyzer coordinating fragment fetching and analysis with pagination and retry logic
src/content-fragment-insights/fragment-analyzer.js Core logic for identifying unused fragments based on age thresholds and lifecycle status
src/content-fragment-insights/clients/aem-client.js AEM Sites API client with IMS authentication and token management
test/audits/content-fragment-unused/handler.test.js Comprehensive test coverage for audit runner and suggestion creation
test/audits/content-fragment-unused/s3-storage.test.js Tests for S3 storage operations including upload/download and error handling
test/audits/content-fragment-unused/opportunity-data-mapper.test.js Tests validating opportunity data structure and content
test/audits/content-fragment-insights/aem-analyzer.test.js Tests for analyzer including pagination, retry logic, and fragment parsing
test/audits/content-fragment-insights/fragment-analyzer.test.js Tests for unused fragment detection logic and threshold handling
test/audits/content-fragment-insights/aem-client.test.js Tests for AEM API client including authentication and fragment retrieval

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/content-fragment-unused/opportunity-data-mapper.js Outdated
Comment thread test/audits/content-fragment-unused/opportunity-data-mapper.test.js Outdated
continue;
}

// TODO: Check MODIFIED content to be unpublished before adding to unused fragments
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates that the MODIFIED status handling is incomplete. Fragments with MODIFIED status are currently being included in unused fragments without verifying if they are actually unpublished. This could lead to false positives where published fragments with modifications are incorrectly flagged as unused.

Suggested change
// TODO: Check MODIFIED content to be unpublished before adding to unused fragments
// For MODIFIED fragments, only consider them unused if they are unpublished
if (
fragment.status &&
fragment.status.toUpperCase() === 'MODIFIED' &&
fragment.publishedAt
) {
// Fragment is MODIFIED but published, so not unused
// eslint-disable-next-line no-continue
continue;
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The publishedAt timestamp doesn't give us the necessary information whether it was just recently unpublished, and then modified. It only provides information about its last publishing time. Still need to figure out how to determine this more precisely, or otherwise consider removing MODIFIED fragments from the audit completely

Comment thread src/content-fragment-unused/storage/s3-storage.js
Comment thread src/content-fragment-unused/storage/s3-storage.js
holtvogt and others added 5 commits December 5, 2025 14:37
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This update simplifies the handling of content fragment statuses by removing 'MODIFIED' from the list of unused content statuses.
holtvogt added a commit to adobe/spacecat-shared that referenced this pull request Dec 10, 2025
Please ensure your pull request adheres to the following guidelines:

- [x] make sure to link the related issues in this description
- [x] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes

## Related

- adobe/spacecat-audit-worker#1574
- adobe/spacecat-autofix-worker#321
@holtvogt holtvogt marked this pull request as ready for review December 10, 2025 10:49
@solaris007 solaris007 closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants