-
Notifications
You must be signed in to change notification settings - Fork 209
Description
Hey there,
we are using CKAN 2.10.4 with harvest 1.5.6 and dcat 1.7.0.
We observe since half a year an incremental growing harvester_object table.
Currently it is around 7 Gigabytes, while our harvesters only run once a day and we handle around 30k datasets per day.
Our package table only is around 32MB, so we are irritated, if there is no automatic cleanup mechanism?
I checked the docs, checked various CLI commands with --help and also had a short insight into the code - without any further ideas.
How may i cleanup this table without crashing too much? Delete all records that finished older than 2 months?
I already saw #347, but it does not sound promising to me to redirect the data to redis, as redis will just flush inserted data when memory hits its given limit - even when the corresponding harvester run is not finished yet.
So i would like to see an automatic cleanup when jobs are finished, but keep records causing errors for comparison purposes.
Please give me some advice with best practices.
Thank you very much and have a nice week.