fix(server): avoid loading huge task results for metadata queries#3060
fix(server): avoid loading huge task results for metadata queries#3060contrueCT wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3060 +/- ##
============================================
- Coverage 36.10% 29.90% -6.20%
+ Complexity 338 264 -74
============================================
Files 803 804 +1
Lines 68291 68382 +91
Branches 8970 8982 +12
============================================
- Hits 24656 20452 -4204
- Misses 40990 45564 +4574
+ Partials 2645 2366 -279 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: Found a scheduler API compatibility regression and a failing targeted test. Evidence: TaskCoreTest#testTaskWithoutResult failed locally; codecov/project is also failing.
🔗 Please also check the failed codecov/project status: https://app.codecov.io/gh/apache/hugegraph/pull/3060
| @Override | ||
| public <V> Iterator<HugeTask<V>> tasks(List<Id> ids) { | ||
| return this.tasksWithoutResult(ids); | ||
| return this.tasks(ids, false); |
There was a problem hiding this comment.
tasks(...) result contract
The new TaskScheduler default methods keep the old behavior as tasks(ids) -> tasks(ids, true), but this override changes the task/result-backed scheduler to return metadata-only tasks by default. Existing Java callers of scheduler.tasks(...).next().result() can now silently get null while scheduler.task(id) still returns the result. Please keep the no-result path opt-in by removing these overrides or delegating them to withResult=true, and pass false only from the metadata-only call sites such as REST list, restore, delete, and scheduler scans.
| TaskScheduler scheduler = graph.taskScheduler(); | ||
| CountDownLatch latch = new CountDownLatch(1); | ||
|
|
||
| TaskCallable<String> callable = new TaskCallable<String>() { |
There was a problem hiding this comment.
This anonymous TaskCallable is persisted by class name, then the worker reloads it through TaskCallable.fromClass(). It has no public no-arg constructor, so the task fails to load as TaskCallable$1, and scheduler.waitUntilTaskCompleted(id, 10) times out. I reproduced it with mvn test -pl hugegraph-server/hugegraph-test -am -P unit-test -Dtest=TaskCoreTest#testTaskWithoutResult -DfailIfNoTests=false -ntp. Please use an existing reloadable callable or add a small static callable class for this test.
Purpose of the PR
Fix task metadata/recovery paths that can still load and decompress large task results, making
GET /tasks, task restore, andDELETE /tasks/{id}?force=trueunusable when historical tasks contain huge result payloads.This is related to #3057 and #3059. The PR focuses on metadata-only access and cleanup paths, while large Gremlin result export/chunking can be handled separately.
Main Changes
~task_result.GET /tasks/{id}default behavior compatible, and addwith_result=false.GET /tasks, task restore, and delete paths use metadata-only reads.GraphTransaction.removeVertex()force-loading indexed task vertices.~taskresultvertices when cleaning distributed task results.withResult=false.Verifying these changes
mvn test -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=TaskCoreTest -DfailIfNoTests=false -Dcheckstyle.skip=true -Drat.skip=truemvn compile -pl hugegraph-server/hugegraph-api -am -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=truemvn test-compile -pl hugegraph-server/hugegraph-test -am -P api-test,rocksdb -Dtest=TaskApiTest -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=trueDoes this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need