HUGE harvest_object table size

Hey there,

we are using CKAN 2.10.4 with harvest 1.5.6 and dcat 1.7.0.

We observe since half a year an incremental growing harvester_object table.
Currently it is around 7 Gigabytes, while our harvesters only run once a day and we handle around 30k datasets per day.
Our package table only is around 32MB, so we are irritated, if there is no automatic cleanup mechanism?

I checked the docs, checked various CLI commands with --help and also had a short insight into the code - without any further ideas.

How may i cleanup this table without crashing too much? Delete all records that finished older than 2 months?

I already saw #347, but it does not sound promising to me to redirect the data to redis, as redis will just flush inserted data when memory hits its given limit - even when the corresponding harvester run is not finished yet.

So i would like to see an automatic cleanup when jobs are finished, but keep records causing errors for comparison purposes.

Please give me some advice with best practices.

Thank you very much and have a nice week.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HUGE harvest_object table size #567

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

HUGE harvest_object table size #567

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions