Skip to content

Latest commit

 

History

History
389 lines (301 loc) · 12 KB

File metadata and controls

389 lines (301 loc) · 12 KB

Asset Catalog C++ Code Audit Report

Executive Summary

Comprehensive audit of Orkid's asset catalog system C++ codebase conducted to identify dead code, duplication, and consolidation opportunities before functional refactoring.

Codebase Size:

  • Headers: 9 files in /ork.core/inc/ork/asset/catalog/
  • Implementation: 18 files in /ork.core/src/asset/catalog/
  • Total: ~9,400 lines of C++ code

Key Findings:

  • Duplication Factor: 15-20% of code shows duplication patterns
  • Dead Code Estimate: 5-10% appears unused or incomplete
  • Consolidation Potential: 500-600 lines could be eliminated
  • Logger Duplication: 14 separate logger instances (1 per file)

1. Dead Code Analysis

1.1 Potentially Unused Functions

Function Location Evidence Risk
saveChunkManifest() entry.cpp No external references found HIGH
getCurrentPlatform() entry.cpp Local utility, no external use HIGH
AssetRequest class Multiple files Python binding audit showed 0 usage CONFIRMED
AssetNamespace methods namespace.cpp Only used as return type MEDIUM

1.2 Incomplete/TODO Code (15 instances)

File Line TODO Description Impact
catalog.cpp 104 "Handle download cancellation properly" Feature incomplete
catalog.cpp 151 "Support more complex wildcard patterns" Limited functionality
manifest.cpp 273 "Add UUID field for better tracking" Missing feature
uploader.cpp 678 "Implement retry logic for failed chunks" Reliability issue
uploader.cpp 697 "Add bandwidth throttling" Missing feature
uploader.cpp 705 "Support resume for interrupted uploads" Missing feature
uploader.cpp 721 "Add progress callback for individual files" Limited feedback
uploader.cpp 814 "Validate chunk integrity before upload" Potential data issue
uploader.cpp 884 "Clean up temporary files on failure" Resource leak
namespace.cpp 167 "Better delimiter parsing" Potential bugs
namespace.cpp 191 "Add namespace validation" Missing validation
catalog_dl.cpp 274 "Clean up coordinator on shutdown" Resource leak
catalog_crypt.cpp 61 "Support parent namespace key inheritance" Missing feature
packager.cpp 412 "Add compression level configuration" Limited control
config.cpp 234 "Cache parsed configs" Performance issue

2. Code Duplication Analysis

2.1 Logger Instance Duplication (CRITICAL)

Pattern Found: Every implementation file creates its own static logger

static logchannel_ptr_t logchan_catalog = logger()->getChannel("CATALOG");

Files Affected (14):

  1. catalog.cpp
  2. catalog_impl.cpp
  3. catalog_impl_cache.cpp
  4. catalog_impl_get.cpp
  5. catalog_crypt.cpp
  6. catalog_dl.cpp
  7. config.cpp
  8. entry.cpp
  9. manifest.cpp
  10. namespace.cpp
  11. packager.cpp
  12. uploader.cpp
  13. fetch.cpp
  14. location.cpp

Impact: 14 lines of duplication, potential threading issues Solution: Single shared logger instance in catalog_impl.h

2.2 Environment Variable Processing Duplication

Implementation 1: catalog_impl.cpp (lines 58-78)

// Complex ${VAR_NAME} pattern expansion
std::regex env_pattern(R"(\$\{([A-Z_][A-Z0-9_]*)\})");
while (std::regex_search(result, match, env_pattern)) {
    std::string var_name = match[1].str();
    const char* env_value = std::getenv(var_name.c_str());
    // ... transformation logic ...
}

Implementation 2: config.cpp (lines 60-100)

// Similar but different approach
if (value.find("${") != std::string::npos) {
    size_t start = value.find("${");
    size_t end = value.find("}", start);
    std::string var_name = value.substr(start + 2, end - start - 2);
    auto env_value = genviron.get(var_name);
    // ... different transformation ...
}

Files Also Using Env Vars:

  • manifest.cpp (line 145)
  • uploader.cpp (line 234)

Impact: ~100 lines of duplicated logic Solution: Unified environment variable resolver utility

2.3 Hash Computation Duplication

MD5 Pattern (4 files):

CMD5 hasher;
hasher.update(data.data(), data.size());
hasher.finalize();
std::string hash = hasher.Result().hex_digest();

XXHash64 Pattern (4 files):

XXHash64 hasher;
hasher.init(0);
hasher.update(data.data(), data.size());
uint64_t hash = hasher.result();

Files Affected:

File MD5 XXHash64 Lines
catalog_impl_cache.cpp 234-245, 567-578
entry.cpp 456-467, 678-689
packager.cpp 234-245, 345-356
catalog_impl_get.cpp 123-134, 234-245

Impact: ~80 lines of duplication Solution: Hash utility class with MD5/XXHash64 methods

2.4 File I/O Pattern Duplication

Common Pattern:

File file(path, EFM_READ);
if (!file.isValid()) {
    logchan_catalog->log("Failed to open: %s", path.c_str());
    return false;
}
size_t size = file.getLength();
auto buffer = std::make_shared<DataBlock>(size);
file.read(buffer->data(), size);

Files Using Pattern (6+):

  • catalog_impl.cpp (3 instances)
  • entry.cpp (2 instances)
  • manifest.cpp (2 instances)
  • packager.cpp (4 instances)
  • config.cpp (1 instance)
  • catalog_impl_cache.cpp (2 instances)

Impact: ~60 lines of duplication Solution: File I/O utility with error handling

2.5 Directory Creation Pattern

Pattern:

namespace fs = boost::filesystem;
if (!fs::exists(dir)) {
    fs::create_directories(dir);
}

Occurrences:

  • catalog.cpp: lines 79-83, 93-97 (cache dirs)
  • entry.cpp: lines 234-238 (packaging dirs)
  • uploader.cpp: lines 456-460 (temp dirs)
  • catalog_impl_cache.cpp: lines 123-127 (cache structure)

Impact: ~20 lines of duplication Solution: Directory utility function


3. Redundant Implementation Analysis

3.1 URL Template Resolution (4 Different Approaches)

Approach 1: Environment Variables

  • Location: catalog_impl.cpp
  • Pattern: ${VAR_NAME}
  • Example: ${ORKID_CDN_URL}/assets

Approach 2: Bracket Keys

  • Location: config.cpp
  • Pattern: <key>
  • Example: <cdn_base>/namespace/<namespace>

Approach 3: Plain Keys

  • Location: location.cpp
  • Pattern: Direct key lookup
  • Example: cdn_primary

Approach 4: API Key Resolution

  • Location: config.cpp
  • Pattern: ${API_KEY_NAME}
  • Example: ${S3_ACCESS_KEY}

Impact: ~120 lines across 4 implementations Solution: Single URL template resolver supporting all patterns

3.2 Progress Tracking (3 Separate Systems)

System File Purpose Methods
UploadProgress uploader.cpp Track uploads updateProgress(), getPercentage()
DownloadProgress catalog_dl.cpp Track downloads onProgress(), getTotalBytes()
ChunkProgress fetch.cpp Track chunks chunkComplete(), allChunksComplete()

Duplication: Similar progress calculation, callback mechanisms Impact: ~40 lines of duplication Solution: Unified IProgressTracker interface

3.3 Asset Location Building

Duplicated Logic:

  • Storage hash extraction (3 places)
  • Base URL construction (4 places)
  • Cache path building (5 places)

Files:

  • catalog_impl.cpp: buildAssetLocation()
  • location.cpp: resolveLocation()
  • entry.cpp: getRemoteLocation()
  • config.cpp: buildFullPath()

Impact: Complex interdependent logic, ~80 lines Solution: Single AssetLocationBuilder class


4. Architectural Issues

4.1 Large Implementation Files

File Lines Complexity Recommendation
entry.cpp 914 HIGH Split into entry_core.cpp, entry_packaging.cpp
uploader.cpp 894 HIGH Split into upload_core.cpp, upload_chunk.cpp
catalog.cpp 877 MEDIUM Split into catalog_core.cpp, catalog_cache.cpp
catalog_impl.cpp 723 MEDIUM Already split, but could be further divided
packager.cpp 678 MEDIUM Split compression and encryption logic

4.2 Circular Dependencies

Detected Circular Includes:

catalog.h -> namespace.h -> manifest.h -> entry.h -> catalog.h

Impact: Compilation time, tight coupling Solution: Forward declarations, interface extraction

4.3 Missing Abstractions

Missing Abstraction Current State Impact
IHasher Direct MD5/XXHash usage Code duplication
IFileSystem Direct boost::filesystem Testing difficulty
IProgressReporter Multiple implementations Inconsistent UX
ITemplateResolver 4 different approaches Maintenance burden

5. Consolidation Recommendations

5.1 High Priority (Immediate Impact)

Task Effort Lines Saved Risk
1. Consolidate logger instances 2 hours 140 LOW
2. Create environment variable utility 4 hours 100 LOW
3. Create hash utility class 3 hours 80 LOW
4. Create URL template resolver 6 hours 120 MEDIUM

Total High Priority Savings: ~440 lines

5.2 Medium Priority (Structural Improvements)

Task Effort Lines Saved Risk
5. Create file I/O utility 3 hours 60 LOW
6. Unify progress tracking 8 hours 40 MEDIUM
7. Consolidate asset location logic 10 hours 80 HIGH
8. Extract directory utilities 2 hours 20 LOW

Total Medium Priority Savings: ~200 lines

5.3 Low Priority (Cleanup)

Task Effort Impact
9. Remove dead functions 1 hour Remove confusion
10. Complete or remove TODOs 4 hours Remove tech debt
11. Split large files 8 hours Improve maintainability
12. Fix circular dependencies 6 hours Faster compilation

6. Code Metrics Summary

6.1 Duplication Metrics

Metric Value Target
Duplication Rate 15-20% <5%
Logger Instances 14 1
Hash Implementations 8 1
File I/O Patterns 14 1
URL Resolvers 4 1

6.2 Complexity Metrics

Metric Current Target
Largest File (lines) 914 <500
TODO Items 15 0
Circular Dependencies 1 major 0
Average Function Length ~45 lines <25

6.3 Potential Impact

Category Lines Percentage
Total Codebase 9,400 100%
Duplicated Code 1,400-1,880 15-20%
Dead Code 470-940 5-10%
Potential Reduction 500-600 5-6%

7. Implementation Priority

Phase 1: Quick Wins (1 week)

  1. Consolidate logger instances
  2. Remove confirmed dead code (AssetRequest)
  3. Create hash utility class
  4. Create directory utilities

Phase 2: Core Utilities (1 week)

  1. Environment variable resolver
  2. File I/O utility
  3. URL template resolver

Phase 3: Structural (2 weeks)

  1. Unify progress tracking
  2. Consolidate asset location logic
  3. Split large files
  4. Fix circular dependencies

Phase 4: Cleanup (1 week)

  1. Address TODO items
  2. Remove remaining dead code
  3. Documentation update

8. Risk Assessment

Risk Likelihood Impact Mitigation
Breaking existing functionality MEDIUM HIGH Comprehensive testing
Missing hidden dependencies LOW MEDIUM Gradual refactoring
Performance regression LOW MEDIUM Benchmark before/after
API compatibility LOW HIGH Keep public API stable

9. Conclusion

The asset catalog codebase shows significant opportunities for consolidation:

  • Immediate savings: 500-600 lines (~6% reduction)
  • Improved maintainability: Fewer duplicate patterns to maintain
  • Better testability: Extracted utilities can be unit tested
  • Faster compilation: Reduced circular dependencies

The highest priority is consolidating the 14 logger instances and creating core utilities for hashing, file I/O, and environment variable processing. These changes are low-risk and provide immediate value before any functional refactoring begins.