Skip to content

fix(server): avoid loading huge task results for metadata queries#3060

Open
contrueCT wants to merge 1 commit into
apache:masterfrom
contrueCT:task/avoid-loading-task-result
Open

fix(server): avoid loading huge task results for metadata queries#3060
contrueCT wants to merge 1 commit into
apache:masterfrom
contrueCT:task/avoid-loading-task-result

Conversation

@contrueCT

Copy link
Copy Markdown
Contributor

Purpose of the PR

Fix task metadata/recovery paths that can still load and decompress large task results, making GET /tasks, task restore, and DELETE /tasks/{id}?force=true unusable when historical tasks contain huge result payloads.

This is related to #3057 and #3059. The PR focuses on metadata-only access and cleanup paths, while large Gremlin result export/chunking can be handled separately.

Main Changes

  • Add scheduler APIs that allow task reads with or without result payloads.
  • Add metadata-only task deserialization that avoids reading ~task_result.
  • Keep GET /tasks/{id} default behavior compatible, and add with_result=false.
  • Make GET /tasks, task restore, and delete paths use metadata-only reads.
  • Add lightweight task vertex deletion to avoid GraphTransaction.removeVertex() force-loading indexed task vertices.
  • Avoid loading separated ~taskresult vertices when cleaning distributed task results.
  • Return a result-stripped copy for in-memory tasks when withResult=false.

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn test -pl hugegraph-server/hugegraph-test -am -P core-test,rocksdb -Dtest=TaskCoreTest -DfailIfNoTests=false -Dcheckstyle.skip=true -Drat.skip=true
    • mvn compile -pl hugegraph-server/hugegraph-api -am -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=true
    • mvn test-compile -pl hugegraph-server/hugegraph-test -am -P api-test,rocksdb -Dtest=TaskApiTest -DskipTests=true -Dcheckstyle.skip=true -Drat.skip=true

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. api Changes of API bug Something isn't working labels Jun 17, 2026
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 59.84252% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.90%. Comparing base (c3f56b5) to head (47fb2f9).

Files with missing lines Patch % Lines
.../apache/hugegraph/task/TaskAndResultScheduler.java 0.00% 26 Missing ⚠️
...g/apache/hugegraph/task/StandardTaskScheduler.java 70.37% 7 Missing and 1 partial ⚠️
.../main/java/org/apache/hugegraph/task/HugeTask.java 86.84% 2 Missing and 3 partials ⚠️
...pache/hugegraph/task/DistributedTaskScheduler.java 0.00% 4 Missing ⚠️
...ava/org/apache/hugegraph/task/TaskTransaction.java 76.47% 1 Missing and 3 partials ⚠️
.../org/apache/hugegraph/auth/HugeGraphAuthProxy.java 57.14% 3 Missing ⚠️
...ain/java/org/apache/hugegraph/api/job/TaskAPI.java 80.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c3f56b5) and HEAD (47fb2f9). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (c3f56b5) HEAD (47fb2f9)
3 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3060      +/-   ##
============================================
- Coverage     36.10%   29.90%   -6.20%     
+ Complexity      338      264      -74     
============================================
  Files           803      804       +1     
  Lines         68291    68382      +91     
  Branches       8970     8982      +12     
============================================
- Hits          24656    20452    -4204     
- Misses        40990    45564    +4574     
+ Partials       2645     2366     -279     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: Found a scheduler API compatibility regression and a failing targeted test. Evidence: TaskCoreTest#testTaskWithoutResult failed locally; codecov/project is also failing.


🔗 Please also check the failed codecov/project status: https://app.codecov.io/gh/apache/hugegraph/pull/3060

@Override
public <V> Iterator<HugeTask<V>> tasks(List<Id> ids) {
return this.tasksWithoutResult(ids);
return this.tasks(ids, false);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Preserve the existing tasks(...) result contract

The new TaskScheduler default methods keep the old behavior as tasks(ids) -> tasks(ids, true), but this override changes the task/result-backed scheduler to return metadata-only tasks by default. Existing Java callers of scheduler.tasks(...).next().result() can now silently get null while scheduler.task(id) still returns the result. Please keep the no-result path opt-in by removing these overrides or delegating them to withResult=true, and pass false only from the metadata-only call sites such as REST list, restore, delete, and scheduler scans.

TaskScheduler scheduler = graph.taskScheduler();
CountDownLatch latch = new CountDownLatch(1);

TaskCallable<String> callable = new TaskCallable<String>() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Use a reloadable task callable in this regression test

This anonymous TaskCallable is persisted by class name, then the worker reloads it through TaskCallable.fromClass(). It has no public no-arg constructor, so the task fails to load as TaskCallable$1, and scheduler.waitUntilTaskCompleted(id, 10) times out. I reproduced it with mvn test -pl hugegraph-server/hugegraph-test -am -P unit-test -Dtest=TaskCoreTest#testTaskWithoutResult -DfailIfNoTests=false -ntp. Please use an existing reloadable callable or add a small static callable class for this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Changes of API bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants