feat: update content libraries API to use events from openedx-core [FC-0117]#38437
Conversation
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. DetailsWhere can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
| { # Not 100% sure we want this, but a PUBLISHED event is emitted for container 2 | ||
| # because one of its children's published versions has changed, so whether or | ||
| # not it contains unpublished changes may have changed and the search index | ||
| # may need to be updated. It is not actually published though. | ||
| # TODO: should this be a CONTAINER_CHILD_PUBLISHED event? | ||
| # No PUBLISHED event is emitted for container 2, because it doesn't have a published version yet. | ||
| # Publishing 'html_block' would have potentially affected it if container 2's published version had a | ||
| # reference to 'html_block', but it doesn't yet until we publish it. | ||
| ) | ||
|
|
||
| # note that container 2 is still unpublished | ||
| c2_after = self._get_container(container2["id"]) | ||
| assert c2_after["has_unpublished_changes"] | ||
|
|
||
| # publish container2 now: | ||
| self._publish_container(container2["id"]) | ||
| self.expect_new_events( | ||
| { # An event for container 1 being published: | ||
| "signal": LIBRARY_CONTAINER_PUBLISHED, | ||
| "library_container": LibraryContainerData( | ||
| container_key=LibraryContainerLocator.from_string(container2["id"]), | ||
| ), | ||
| }, | ||
| { # An event for the html block in container 2 only: | ||
| "signal": LIBRARY_BLOCK_PUBLISHED, | ||
| "library_block": LibraryBlockData( | ||
| self.lib1_key, LibraryUsageLocatorV2.from_string(html_block2["id"]), | ||
| ), | ||
| }, |
There was a problem hiding this comment.
It's a little hard to tell from the diff here (because of how it's split up), but before this PR, a spurious PUBLISHED event was emitted for container 2 before it was ever published at all. I think the new behavior is much more correct, because it's built on Learning Core's new publish log side effects. I have explained why in the test case and added additional tests to ensure side effects are still resulting in PUBLISHED events when they should be. (Once we actually published container 2)
| { | ||
| "signal": CONTENT_OBJECT_ASSOCIATIONS_CHANGED, | ||
| "content_object": ContentObjectChangedData( | ||
| object_id=str(container_key), | ||
| changes=["collections", "tags"], | ||
| ), | ||
| }, | ||
| # We used to emit CONTENT_OBJECT_ASSOCIATIONS_CHANGED here for the restored container, specifically noting | ||
| # that changes=["collections", "tags"], because deleted things may have collections+tags that are once | ||
| # again relevant when it is restored. However, the CREATED event should be sufficient for notifying of that. | ||
| # (Or should we emit CREATED+UPDATED to be extra sure?) |
There was a problem hiding this comment.
Flagging this, as it's a change - no longer emitting CONTENT_OBJECT_ASSOCIATIONS_CHANGED in the case of restoring a deleted object.
TODO: test publishing a thing with collections and tags, delete it, then "revert all changes" in the library UI and make sure it re-appears with collections and tags intact. I haven't tested this yet.
| # openedx_content also lists ancestor containers of the affected units as changed. | ||
| # We don't strictly need this at the moment, at least as far as keeping our search index updated. | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.subsection1.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.subsection2.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.section1.container_key), | ||
| }, | ||
| { | ||
| "signal": LIBRARY_CONTAINER_UPDATED, | ||
| "library_container": LibraryContainerData(container_key=self.section2.container_key), |
There was a problem hiding this comment.
Last change: we now emit events for ancestors of parent containers of modified entities, which we weren't doing before (before it was only one level - parent containers but not their ancestors in turn). I don't think we have a use case for this, but I am not sure if I could or should filter them out somehow, as the publish log treats direct ancestors (which we definitely care about and need events for) and their ancestors in turn exactly the same.
To avoid performance issues, in such cases where more than one ancestor is included in the event stream, the event for the directly modified entity is emitted synchronously but the indirect container events are emitted asynchronously. This seems to work well in the UI, making it update correctly/immediately when e.g. renaming something, but should still preserve performance even if you rename a component used in thousands of different containers.
There was a problem hiding this comment.
(Still need to look through test changes.)
At a high level, I do have a bit of a concern that having some things be sync and some async at a granular level (depending on how many things there are) is going to lead to inconsistencies and bugs. I think it's a reasonable tradeoff at the moment--just something we should keep an eye on.
| # Which entities were _directly_ changed here? | ||
| direct_changes = [asdict(change) for change in change_log.changes if change.new_version != change.old_version] | ||
| # And which entities were indirectly affected (e.g. parent containers)? | ||
| indirect_changes = [asdict(change) for change in change_log.changes if change.new_version == change.old_version] |
There was a problem hiding this comment.
[Comment] This reminds me that we should probably put a couple of helper methods in DraftChangeLog and DraftChangeLogRecord for this sort of thing, so we can keep the terminology consistent over time. Made a ticket for that: openedx/openedx-core#560
| # .. event_implemented_name: LIBRARY_COLLECTION_CREATED | ||
| # .. event_type: org.openedx.content_authoring.content_library.collection.created.v1 |
There was a problem hiding this comment.
I think these just go where the signal is defined, not where it's sent.
There was a problem hiding this comment.
I realized later in the review that this was like this in the code that you refactored, but I still think it's wrong and should be removed in all the places other than where the signal is first defined.
There was a problem hiding this comment.
OK, great! I didn't want them there anyways; I was just copying the existing pattern without understanding it.
There was a problem hiding this comment.
@ormsbee Actually, it seems the event annotations are quite deliberate - see #36473 and it is mentioned in these docs:
In-line code annotations are also used when integrating the event into the service.
It's not super clear to me why this is the case but I think it's related to what the doc says at the end: "ensures that [the event] is used correctly across services" ?
Maybe @mariajgrimaldi or @BryanttV can clarify how these are used?
There was a problem hiding this comment.
Ack, you're totally right. That's... really weird to me. But okay, thank you.
| # list of "units this component is used in", "sections this subsection is used in", etc. in the search index | ||
| title_changed: bool = bool(old_version and new_version) and (old_version.title != new_version.title) | ||
| if title_changed: | ||
| # TODO: there is no "get entity list for container version" API in openedx_content |
There was a problem hiding this comment.
You could also get effectively the same thing via dependencies (new_version.dependencies.all())
| if hasattr(entity, "component"): | ||
| opaque_key = api.library_component_usage_key(library_key, entity.component) | ||
| elif hasattr(entity, "container"): | ||
| opaque_key = api.library_container_locator(library_key, entity.container) |
There was a problem hiding this comment.
Nit: This really happens often enough where it seems like it should be a helper fn somewhere.
ormsbee
left a comment
There was a problem hiding this comment.
Just a minor additional nit request.
| # .. event_implemented_name: LIBRARY_COLLECTION_CREATED | ||
| # .. event_type: org.openedx.content_authoring.content_library.collection.created.v1 |
There was a problem hiding this comment.
Ack, you're totally right. That's... really weird to me. But okay, thank you.
Yeah, I would prefer a more consistent approach too. But it comes from our direct experience with the libraries work... making everything async makes updating the UI after any change pretty awkward, and making everything sync is way too slow in many cases like renaming something that is used in many different places. So even though it's more complex, this sort of compromise seems to work best for now. |
| # First, remove all children from the subsection: | ||
| with self.captureOnCommitCallbacks(execute=False): # suppress events | ||
| library_api.update_container_children(self.subsection.container_key, [], None) |
There was a problem hiding this comment.
Before: this test was changing a subsection to have the exact same unit child it already had, and that was emitting an event and updating the search index, because library_api.update_container_children was just hard-coded to send out LIBRARY_CONTAINER_UPDATED and CONTENT_OBJECT_ASSOCIATION_CHANGED events every time.
Now: our event logic is "smarter" and only sends out events if the container's children actually changed. So to keep the test working, first I have to clear the container's children.
|
@ormsbee I think I've addressed all your comments. I want to do a bit more manual testing, but I think it's in good shape. |
ccb5a11 to
c2ff3f7
Compare
Now, the search index (and anything else that listens for events) will stay up to date regardless of whether one uses the content_libraries high-level API or the low-level openedx_content APIs to make changes to content.
7f8bef0 to
b05cc84
Compare
b05cc84 to
e5b4048
Compare
|
@ormsbee When manually testing this now with production-like settings (celery enabled), I found that the async updating of the search index was resulting in some changes taking too long to appear in the UX, so I've pushed one new commit e5b4048 which resolves this by changing all our library event handlers to use the "run async but try to wait" logic from #36640 . This made everything work much better in the UI, and I think the code is nicely simplified as well. I squashed all the previous commits that you've reviewed, so just need a review / sanity check on that new commit if you could, please. |
|
New changes LGTM |
…penedx#38437) * chore: bump openedx-core to 0.46.0 * feat: update content libraries API to use events from openedx_content * fix: better dispatching/waiting for async library event handlers Now, the search index (and anything else that listens for events) will stay up to date regardless of whether one uses the content_libraries high-level API or the low-level openedx_content APIs to make changes to content. Co-Authored-By: Claude <noreply@anthropic.com>
…penedx#38437) * chore: bump openedx-core to 0.46.0 * feat: update content libraries API to use events from openedx_content * fix: better dispatching/waiting for async library event handlers Now, the search index (and anything else that listens for events) will stay up to date regardless of whether one uses the content_libraries high-level API or the low-level openedx_content APIs to make changes to content. Co-Authored-By: Claude <noreply@anthropic.com>
Description
With openedx/openedx-core#543, openedx-core now emits events when changes happen within a Learning Package.
This PR updates the content libraries code and search code accordingly. The main benefit is that the search index now stays up to date regardless of which APIs are used. We don't need to "wrap" some low-level APIs in high-level APIs just to add events.
Note: The "Library Collections" code was already working fine because it used Django signals to watch for changes to the Collection-PublishableEntity many-to-many relationship, but it shouldn't have been so aware of the internals of
openedx_content.Supporting information
See openedx/openedx-core#462
Testing instructions
First, enable celery on your devstack:
cms/envs/devstack.pyand changeCELERY_ALWAYS_EAGERtoTruetutor dev exec cms celery --app=cms.celery worker --loglevel=info --hostname=edx.cms.core.default.%h --queues=edx.cms.core.default,edx.cms.core.high,edx.cms.core.low --without-gossip --without-mingleThen, go to the content libraries UI http://apps.local.openedx.io:2001/authoring/libraries and:
Test change publishing:
Test collections:
etc.
Basically, make changes in the library authoring UI and make sure the search results stay up to date as you go through making edits.
Deadline
Verawood
Other information
Depends on openedx/openedx-core#543 .
I wrote most of the code but used Claude Code for small bits and pieces.