[opt](job) delay Kafka read committed zero-row retries#64046
Open
sollhui wants to merge 1 commit into
Open
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
/review |
Contributor
There was a problem hiding this comment.
Review result: no blocking issues found in the changed code.
Critical checkpoint conclusions:
- Goal and proof: The PR delays renewed Kafka routine-load tasks when read_committed commits consume zero rows while task lag remains positive. The implementation reuses the existing zero-row positive-lag predicate and adds a focused unit test covering propagation to the renewed task.
- Scope and clarity: The change is small and focused on RoutineLoadTaskInfo/KafkaTaskInfo/KafkaRoutineLoadJob scheduling state.
- Concurrency and locking: The new state is set in the existing transaction status handling path under the routine-load job write lock and copied during the existing renew path. No new lock ordering or heavy locked operation was introduced.
- Lifecycle/static state: No new static initialization dependency or special ownership lifecycle was introduced.
- Configuration/compatibility: No new config, persisted format, FE-BE protocol field, or incompatible storage/API change was introduced.
- Parallel paths: Non-Kafka routine-load tasks keep the default false delay predicate; Kafka renew now preserves both caller-requested delay and the task-derived delay. Kinesis behavior is unchanged.
- Conditional checks: The new condition matches the existing read_committed zero-row positive-lag hint logic.
- Tests: A unit test was added for the new delayed-renew behavior. I did not run FE tests locally because the required FE build prerequisite thirdparty/installed/bin/protoc is missing in this runner.
- Observability: Existing OtherMsg observability for this condition remains in place; no additional metrics/logs appear necessary for this scheduling-only change.
- Transaction/persistence/data correctness: The change affects retry timing only and does not alter data visibility, offset advancement semantics, or transaction commit rules.
- Performance: The added predicate reuses small in-memory maps and does not introduce material overhead.
User focus: No additional user-provided review focus was specified.
Contributor
|
run buildall |
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
Contributor
TPC-H: Total hot run time: 28809 ms |
Contributor
TPC-DS: Total hot run time: 169296 ms |
Contributor
FE Regression Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Kafka routine load with
isolation.level=read_committedcan finish a task with 0 consumed rows while the task still has positive lag, for example when upstream transactional records are not committed and therefore invisible. PR #63664 added anOtherMsghint for this case, but the renewed task could still be scheduled immediately when the normal EOF heuristic did not apply, causing repeated retries. This change reuses the same read_committed zero-row lag detection to mark the next Kafka routine load task for delayed scheduling, so it follows the existingmax_batch_intervaldelay path used by EOF tasks.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)