[improvement](nereids) Estimate scan row count from selected partitions#64032
Open
foxtail463 wants to merge 1 commit into
Open
[improvement](nereids) Estimate scan row count from selected partitions#64032foxtail463 wants to merge 1 commit into
foxtail463 wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
|
/review |
ad131d6 to
6b3af18
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem Summary:
Scan partition pruning state was not consistently represented across OLAP and external file scans. As a result, CBO row-count estimation needed scattered special handling and could not reliably use the selected partitions produced by pruning.
In addition, partition predicates that had already been applied during partition pruning could still be applied again by filter estimation, causing duplicated selectivity estimation. HMS selected-partition row-count estimation also had unclear handling for valid zero-row results versus unknown row count.
Solution:
Introduce a unified partition selection state carried by scan plans, including selected partitions, prune status, manual partition constraints, and partition conjuncts already applied to row-count estimation.
Use selected partitions to estimate scan row count for both OLAP and external file scans. Preserve applied partition conjuncts in Statistics so FilterEstimation can skip predicates that have already affected row count. Also clarify HMS selected-partition row-count handling so zero-row results are treated as valid instead of unknown.