Skip to content

refactor: harden normalizeZoektRepoName against multi-segment repo paths #56

@Anthony-Bible

Description

@Anthony-Bible

Problem

normalizeZoektRepoName in search_service.go strips the leading host segment from zoekt's host/owner/repo format by splitting on / and taking parts[1]+"/"+parts[2]. This breaks for:

  • GitLab nested groups: gitlab.com/org/subgroup/repo → strips only the host, returns org/subgroup/repo instead of the expected DB name
  • GitHub repos with subpaths in exotic configurations

If the normalization produces the wrong name, FindChunksByRepositoryPathAndLineRange / FindChunksByRepositoryPath return empty results silently, and the zoekt result falls back to a file-level DTO with no chunk resolution.

Suggested Fix

Options (pick one):

  1. Config-driven: Accept a list of known zoekt host prefixes to strip at SearchService construction time, so the mapping is explicit and testable.
  2. DB-driven: Query the repository by normalized URL pattern matching against fm.Repository rather than name-based lookup.
  3. Zoekt metadata: Configure zoekt to store the DB's canonical repo name in a custom field and read it back directly.

Files

  • internal/application/service/search_service.gonormalizeZoektRepoName
  • internal/application/service/search_service.goresolveZoektChunk, resolveZoektChunkByPath

Impact

Low probability in current single-host deployments, but will silently degrade chunk resolution quality if multi-host or GitLab-style repos are indexed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions