-
-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Describe the bug
Queries on diffs for even moderately large repositories are incredibly slow. Our repository at work has ~5,500 commits.
The following operation to get the diff with the most deletions took ~30 minutes:
❯ time .cargo/bin/gitql --query 'select * from diffs order by deletions desc limit 1'
╭──────────────────────────────────────────┬───────────────────┬───────────────────────┬────────────┬───────────┬───────────────┬─────────────────────────┬───────────────────────────────────╮
│ commit_id ┆ name ┆ email ┆ insertions ┆ deletions ┆ files_changed ┆ datetime ┆ repo │
╞══════════════════════════════════════════╪═══════════════════╪═══════════════════════╪════════════╪═══════════╪═══════════════╪═════════════════════════╪═══════════════════════════════════╡
│ 8b685201464c3027afe9105bb5ed9b40a1befce7 ┆ Matthew Planchard ┆ [email protected] ┆ 3284 ┆ 41552 ┆ 212 ┆ 2024-08-15 18:15:45.000 ┆ /home/matthew/s/spec/.git │
╰──────────────────────────────────────────┴───────────────────┴───────────────────────┴────────────┴───────────┴───────────────┴─────────────────────────┴───────────────────────────────────╯
________________________________________________________
Executed in 27.37 mins fish external
usr time 27.25 mins 569.00 micros 27.25 mins
sys time 0.04 mins 0.00 micros 0.04 mins
During the entire time, a single thread was pretty much pegged. I can get this same result using git and awk in a fraction (1/270th, 0.37%) of the time:
❯ time git log --pretty="@%h" --shortstat | tr "\n" " " | tr "@" "\n" | awk '{if ($7 > deletions) { deletions = $7; commit = $1 }}; END { print commit; print deletions }'
8b6852014
41720
________________________________________________________
Executed in 6.01 secs fish external
usr time 5.41 secs 0.00 millis 5.41 secs
sys time 0.63 secs 1.78 millis 0.63 secs
Queries on commits seem to run in a more reasonable amount of time, e.g.:
❯ time .cargo/bin/gitql --query "select count(author_name) from commits where author_name like '%matthew%'"
╭──────────╮
│ column_2 │
╞══════════╡
│ 1001 │
╰──────────╯
________________________________________________________
Executed in 357.45 millis fish external
usr time 351.94 millis 0.00 micros 351.94 millis
sys time 4.62 millis 641.00 micros 3.98 millis
To Reproduce
- Check out any large repo
- Run the example command above
Expected behavior
Speed is at least within an order of magnitude of git/awk
GQL (please complete the following information):
GitQL version 0.28.0
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request