[ENG-9831] Cedar metadata record creation/deletion in share for collection submission by ihorsokhanexoft · Pull Request #11710 · CenterForOpenScience/osf.io

ihorsokhanexoft · 2026-04-23T18:08:23Z

https://openscience.atlassian.net/browse/ENG-9831

… submission

aaxelb

mainly: tests

also, suggestions

aaxelb · 2026-04-29T17:51:24Z

+        shtrove_ingest_url(),
+        params={
+            'focus_iri': iri,
+            'record_identifier': f"CedarMetadataRecord:{cedar_record.guid._id}:{cedar_record.template.cedar_id}",


since it's important for the record_identifier to match when created/deleted, best move this to a function like _shtrove_cedar_record_identifier(...) (maybe just below the existing _shtrove_record_identifier) for use both places

also, tho i know i suggested this format in the first place, on reflection would prefer something like f'{cedar_record.guid._id}/CedarMetadataRecord:{cedar_record.template.cedar_id}' (to parallel the other supplementary record_identifiers that start with osfid/)

aaxelb · 2026-04-29T17:53:14Z

+    graph = Graph()
+    iri = referent.get_semantic_iri()
+    full_metadata = {
+        '@id': iri,
+        OSF.hasCedarRecord: cedar_record.metadata,
+    }
+    graph.parse(data=full_metadata, format='json-ld')
+
+    serialized_data = graph.serialize(format='turtle')


these lines would make sense as a separate function with its own tests

aaxelb · 2026-04-29T18:02:49Z

+    cedar_record = CedarMetadataRecord.objects.filter(pk=cedar_record_pk).first()
+    if not cedar_record:
+        return
+


if a CedarMetadataRecord is deleted, will this stop before sending the DELETE to shtrove? if a deleted record isn't still in the database, might be better for this task's args to take the cedar template id instead of the record id, so can build the record_identifier without having to load the cedar record

tests would help make sure, either way

aaxelb · 2026-04-29T18:17:26Z

+            for cedar_record in self.guid.cedar_metadata_records.filter(
+                is_published=True,
+                template__should_index_for_search=True
+            ):
+                enqueue_task(share_update_cedar_metadata_record.s(self.guid._id, cedar_record.pk))
+
+            for cedar_record in self.guid.cedar_metadata_records.filter(
+                models.Q(is_published=False) | models.Q(template__should_index_for_search=True)
+            ):
+                enqueue_task(share_delete_cedar_metadata_record.s(self.guid._id, cedar_record.pk))
+


enqueueing tasks from the model here seems brittle... there are other times when these would probably be expected to get indexed for search (e.g. management commands or osf-admin buttons reindexing the object described by the cedar record)

would be more reliable to move this to a followup task after task__update_share completes -- perhaps enqueued here, when _next_partition is None: https://github.com/ihorsokhanexoft/osf.io/blob/06ef8687d1cb004c7eff54c7434e0e26006ef276/api/share/utils.py#L176

(i understand wanting to avoid redundant updates, and the whole update_share chain could be optimized to better avoid redundant updates, but it's not terribly expensive and better to send too often than not enough)

aaxelb · 2026-04-29T18:22:17Z

    enqueue_task(task__update_share.s(_osfguid_value))


+@celery_app.task()


these new tasks should both have similar retry behavior to task__update_share when the request fails -- could move this block to a reusable function https://github.com/ihorsokhanexoft/osf.io/blob/06ef8687d1cb004c7eff54c7434e0e26006ef276/api/share/utils.py#L152-L171

…edar record existence

aaxelb

looking better! two small requests

appreciate the testing effort (tho it could have been easier)

aaxelb · 2026-05-05T15:05:07Z

+        for url in urls_to_find.keys():
+            urls_to_find[url] = result[result.index(url) - 3]
+
+        # fetch urls from result and assign ns prefixes based on order of appearance in result to make test resilient to changes in order of namespace declaration in turtle output


ohh yeah i forgot this disappointing part of rdflib, the unstable turtle serializer -- but instead of complicating test logic like this (since namespace prefixes and object ordering don't really matter) i've found it better to assert on the graph parsed from the given turtle

maybe move this _assert_equivalent_turtle method to reusable assert_equivalent_turtle (with passthru label kwarg instead of filename) in osf_tests/metadata/_utils.py and import from there, then write this test with a simpler "expected" block string (without having to match namespaces by hand)

aaxelb · 2026-05-05T15:14:56Z

+        for cedar_record in _osfid_instance.cedar_metadata_records.filter(
+            is_published=True,
+            template__should_index_for_search=True,
+        ):
+            enqueue_task(share_update_cedar_metadata_record.s(_osfid_instance._id, cedar_record.pk))
+
+        for cedar_record in _osfid_instance.cedar_metadata_records.filter(
+            Q(is_published=False) | Q(template__should_index_for_search=False),
+        ):
+            enqueue_task(
+                share_delete_cedar_metadata_record.s(
+                    cedar_record.guid._id,
+                    cedar_record._id,
+                    cedar_record.template.cedar_id,
+                ),
+            )


these'll be run multiple times, for each osfmap partition -- probably want them in an else block, run when _next_partition is None (after the last osfmap partition has been sent)

also for cleanliness sake, would make sense in a separate function

_next_partition = _next_osfmap_partition(_osfmap_partition) if _next_partition is not None: ... else: # schedule non-osfmap supplements _schedule_cedar_record_updates(_osfid_instance)

ihorsokhanexoft added 4 commits April 23, 2026 21:07

added cedar metadata record creation/deletion in share for collection…

3634ac5

… submission

removed validate_required_metadata redefinition

f52c484

enqueue cedar record update in share

ec8fb9f

added focus_iri to shtrove POST request

06ef868

adlius requested a review from aaxelb April 29, 2026 17:31

aaxelb requested changes Apr 29, 2026

View reviewed changes

ihorsokhanexoft added 5 commits April 30, 2026 16:58

separate cedar record identifier to avoid potential issues

52c8fed

separate cedar record serialization

13ccfd3

added retry, moved tasks to task__update_share

b1630a1

added more tests, make delete cedar record function working without c…

3561ed0

…edar record existence

fixed tests

ff4876f

ihorsokhanexoft requested a review from aaxelb May 5, 2026 12:18

aaxelb requested changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENG-9831] Cedar metadata record creation/deletion in share for collection submission#11710

[ENG-9831] Cedar metadata record creation/deletion in share for collection submission#11710
ihorsokhanexoft wants to merge 9 commits intoCenterForOpenScience:feature/es2-consolidationfrom
ihorsokhanexoft:feature/ENG-9831

ihorsokhanexoft commented Apr 23, 2026 •

edited

Loading

Uh oh!

aaxelb left a comment

Uh oh!

aaxelb Apr 29, 2026

Uh oh!

aaxelb Apr 29, 2026

Uh oh!

aaxelb Apr 29, 2026

Uh oh!

Uh oh!

aaxelb Apr 29, 2026

Uh oh!

aaxelb Apr 29, 2026

Uh oh!

aaxelb left a comment

Uh oh!

aaxelb May 5, 2026

Uh oh!

aaxelb May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		enqueue_task(task__update_share.s(_osfguid_value))


		@celery_app.task()

Conversation

ihorsokhanexoft commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaxelb left a comment

Choose a reason for hiding this comment

Uh oh!

aaxelb Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

aaxelb Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

aaxelb Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aaxelb Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

aaxelb Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

aaxelb left a comment

Choose a reason for hiding this comment

Uh oh!

aaxelb May 5, 2026

Choose a reason for hiding this comment

Uh oh!

aaxelb May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ihorsokhanexoft commented Apr 23, 2026 •

edited

Loading