Skip to content

HUGE harvest_object table size #567

@dev-rke

Description

@dev-rke

Hey there,

we are using CKAN 2.10.4 with harvest 1.5.6 and dcat 1.7.0.

We observe since half a year an incremental growing harvester_object table.
Currently it is around 7 Gigabytes, while our harvesters only run once a day and we handle around 30k datasets per day.
Our package table only is around 32MB, so we are irritated, if there is no automatic cleanup mechanism?

I checked the docs, checked various CLI commands with --help and also had a short insight into the code - without any further ideas.

How may i cleanup this table without crashing too much? Delete all records that finished older than 2 months?

I already saw #347, but it does not sound promising to me to redirect the data to redis, as redis will just flush inserted data when memory hits its given limit - even when the corresponding harvester run is not finished yet.

So i would like to see an automatic cleanup when jobs are finished, but keep records causing errors for comparison purposes.

Please give me some advice with best practices.

Thank you very much and have a nice week.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions