feat: add audit for unused content fragments#1574
Conversation
|
This PR will trigger a minor release when merged. |
…integration and error handling
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new audit for AEM Sites that identifies unused content fragments to optimize content governance. The audit analyzes fragments in draft, new, unpublished, or modified states that have been inactive for 90+ days, providing detailed statistics and actionable insights for cleanup efforts.
Key Changes:
- Implements
content-fragment-unusedaudit with lifecycle status analysis (NEW, DRAFT, UNPUBLISHED, MODIFIED) - Adds AEM client infrastructure for content fragment API integration with IMS authentication
- Implements S3 storage for detailed fragment data with organized date-based path structure
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/index.js | Registers the new content-fragment-unused audit handler in the main handler registry |
| src/content-fragment-unused/handler.js | Main audit orchestration with runner and post-processor for creating opportunities and suggestions |
| src/content-fragment-unused/storage/s3-storage.js | S3 utilities for uploading/downloading fragment data with date-partitioned paths |
| src/content-fragment-unused/opportunity-data-mapper.js | Defines opportunity metadata including title, description, and tags |
| src/content-fragment-insights/aem-analyzer.js | High-level analyzer coordinating fragment fetching and analysis with pagination and retry logic |
| src/content-fragment-insights/fragment-analyzer.js | Core logic for identifying unused fragments based on age thresholds and lifecycle status |
| src/content-fragment-insights/clients/aem-client.js | AEM Sites API client with IMS authentication and token management |
| test/audits/content-fragment-unused/handler.test.js | Comprehensive test coverage for audit runner and suggestion creation |
| test/audits/content-fragment-unused/s3-storage.test.js | Tests for S3 storage operations including upload/download and error handling |
| test/audits/content-fragment-unused/opportunity-data-mapper.test.js | Tests validating opportunity data structure and content |
| test/audits/content-fragment-insights/aem-analyzer.test.js | Tests for analyzer including pagination, retry logic, and fragment parsing |
| test/audits/content-fragment-insights/fragment-analyzer.test.js | Tests for unused fragment detection logic and threshold handling |
| test/audits/content-fragment-insights/aem-client.test.js | Tests for AEM API client including authentication and fragment retrieval |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| continue; | ||
| } | ||
|
|
||
| // TODO: Check MODIFIED content to be unpublished before adding to unused fragments |
There was a problem hiding this comment.
The TODO comment indicates that the MODIFIED status handling is incomplete. Fragments with MODIFIED status are currently being included in unused fragments without verifying if they are actually unpublished. This could lead to false positives where published fragments with modifications are incorrectly flagged as unused.
| // TODO: Check MODIFIED content to be unpublished before adding to unused fragments | |
| // For MODIFIED fragments, only consider them unused if they are unpublished | |
| if ( | |
| fragment.status && | |
| fragment.status.toUpperCase() === 'MODIFIED' && | |
| fragment.publishedAt | |
| ) { | |
| // Fragment is MODIFIED but published, so not unused | |
| // eslint-disable-next-line no-continue | |
| continue; | |
| } |
There was a problem hiding this comment.
The publishedAt timestamp doesn't give us the necessary information whether it was just recently unpublished, and then modified. It only provides information about its last publishing time. Still need to figure out how to determine this more precisely, or otherwise consider removing MODIFIED fragments from the audit completely
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This update simplifies the handling of content fragment statuses by removing 'MODIFIED' from the list of unused content statuses.
Please ensure your pull request adheres to the following guidelines: - [x] make sure to link the related issues in this description - [x] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes ## Related - adobe/spacecat-audit-worker#1574 - adobe/spacecat-autofix-worker#321
This PR introduces a new audit to monitor and identify unused content fragments on AEM to optimize content governance and reduce system overhead. See SITES-36578.
What's New
Content Fragment Unused Audit (
content-fragment-unused): Analyzes content fragments in AEM to identify unused content that has remained in draft, modified, or unpublished states for extended periods (90+ days). The audit categorizes fragments by their lifecycle status and provides detailed statistics including age distribution, counts, and percentages to help teams prioritize cleanup efforts.The audit identifies four categories of unused content:
NEW: Fragments created but never publishedDRAFT: Fragments created and modified but never publishedUNPUBLISHED: Fragments that were published then unpublished without further modificationsUse Case
Content governance optimization for AEM Sites: Automatically detect stale and unused content fragments across AEM Sites by analyzing content lifecycle status and age. The audit provides actionable insights to help teams identify orphaned drafts, abandoned work-in-progress, and outdated content, enabling focus on active content management. Each finding includes detailed metadata such as fragment age, last modification date, and publication history to support informed cleanup decisions.
Related