Skip to content

Problem with harvester on Latvian open data portal  #543

@vicehovskis

Description

@vicehovskis

Hello, all

We have some issues with harvester on our Latvian open data portal -> https://data.gov.lv/lv

Now we have CKAN version 2.8.6 and we could successful harvest from two CSW resources ->

http://195.244.156.233:8080/geoportal/csw
https://geometadati.viss.gov.lv/geoportal/csw

But now on our Latvian Geoportal test enviroment we created new OGC CSW and we try to harvest from it in TEST enviroment in the same way, but receive this error ->

999 ERROR [ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback (most recent call last):
  File "/usr/lib/ckan/default/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py", line 95, in gather_stage
    for identifier in self.csw.getidentifiers(page=10, outputschema=self.output_schema(), cql=cql):
  File "/usr/lib/ckan/default/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py", line 127, in getidentifiers
    csw.getrecords2(**kwa)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/csw.py", line 341, in getrecords2
    self._invoke()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/csw.py", line 582, in _invoke
    self.response = util.http_post(self.url, self.request, self.lang, self.timeout)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/owslib/util.py", line 285, in http_post
    up = urllib2.urlopen(r,timeout=timeout);
  File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib64/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib64/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib64/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 500: Internal Server Error
2023-10-23 16:19:01,009 ERROR [ckanext.harvest.queue] Gather stage failed

Also in our test enviroment our developer trying to update CKAN version to 2.10.1, and with that new version harvester doesn`t work at all, there are lots the same errors and no ideas how to solve them ->

sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:53,874 ERROR [ckanext.harvest.plugin] Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9
2023-11-14 16:29:54,463 ERROR [ckan.lib.search] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
Traceback (most recent call last):
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 143, in dispatch_by_operation
index.insert_dict(entity)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 79, in insert_dict
return self.update_dict(data)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 106, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 123, in index_package
validated_pkg_dict, _errors = lib_plugins.plugin_validate(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/plugins.py", line 331, in plugin_validate
return validate(data_dict, schema, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 305, in validate
flat_data, errors = _validate(flattened, schema, validators_context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 356, in _validate
convert(converter, key, converted_data, errors, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 262, in convert
value = converter(*params)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/validators.py", line 195, in package_id_exists
result = session.query(model.Package).get(value)
File "", line 2, in get
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 402, in warned
return fn(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 947, in get
return self._get_impl(ident, loading.load_on_pk_identity)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 951, in _get_impl
return self.session._get_impl(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2912, in _get_impl
return db_load_fn(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
conn = self._connection_for_bind(bind)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
self._assert_active()
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:54,464 ERROR [ckan.model.modification] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
Traceback (most recent call last):
File "/usr/lib/ckan/default/src/ckan/ckan/model/modification.py", line 71, in notify
observer.notify(entity, operation)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 165, in notify
dispatch_by_operation(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/init.py", line 143, in dispatch_by_operation
index.insert_dict(entity)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 79, in insert_dict
return self.update_dict(data)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 106, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 123, in index_package
validated_pkg_dict, _errors = lib_plugins.plugin_validate(
File "/usr/lib/ckan/default/src/ckan/ckan/lib/plugins.py", line 331, in plugin_validate
return validate(data_dict, schema, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 305, in validate
flat_data, errors = _validate(flattened, schema, validators_context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 356, in _validate
convert(converter, key, converted_data, errors, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/navl/dictization_functions.py", line 262, in convert
value = converter(*params)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/validators.py", line 195, in package_id_exists
result = session.query(model.Package).get(value)
File "", line 2, in get
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 402, in warned
return fn(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 947, in get
return self._get_impl(ident, loading.load_on_pk_identity)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line 951, in _get_impl
return self.session._get_impl(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 2912, in _get_impl
return db_load_fn(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
conn = self._connection_for_bind(bind)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 721, in _connection_for_bind
self._assert_active()
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]] (Background on this error at: https://sqlalche.me/e/14/7s2a)
2023-11-14 16:29:53,790 INFO [ckanext.harvest.plugin] Creating harvest source: {'__extras': {'active': True}, 'frequency': 'MANUAL', 'name': 'geolatvija-test', 'owner_org': '537dea7f-c55d-4607-8838-260bea1c7f1e', 'source_type': 'csw', 'title': 'zmni', 'type': 'harvest', 'url': 'https://geolatvija-test.vraa.gov.lv/geonetwork/opendata/eng/csw', 'extras': [{'key': 'frequency', 'value': 'MANUAL'}, {'key': 'source_type', 'value': 'csw'}], 'creator_user_id': '452a87b4-897b-4b62-8d8a-eae98964ce55', 'id': '8bbf3c49-0fac-4575-a0bb-d3e52b6996f9'}
2023-11-14 16:29:54,465 INFO [ckanext.harvest.plugin] Harvest source created: 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/ckan", line 8, in
sys.exit(ckan())
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/cli.py", line 51, in create
result = utils.create_harvest_source(
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/utils.py", line 139, in create_harvest_source
source = tk.get_action("harvest_source_create")(context, data_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/init.py", line 551, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/logic/action/create.py", line 72, in harvest_source_create
source = toolkit.get_action('package_create')(context, data_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/init.py", line 551, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 243, in package_create
model.repo.commit()
File "", line 2, in commit
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
self._transaction.commit(_to_root=self.future)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 827, in commit
self._assert_active(prepared_ok=True)
File "/usr/lib/ckan/default/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line 601, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (builtins.RecursionError) maximum recursion depth exceeded while calling a Python object
[SQL: INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)]
[parameters: [{'content': 'Harvest source not found for dataset 8bbf3c49-0fac-4575-a0bb-d3e52b6996f9', 'level': 'ERROR'}]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions