Skip to content

[improvement](nereids) Estimate scan row count from selected partitions#64032

Open
foxtail463 wants to merge 1 commit into
apache:masterfrom
foxtail463:improvement/selected-partition-row-count
Open

[improvement](nereids) Estimate scan row count from selected partitions#64032
foxtail463 wants to merge 1 commit into
apache:masterfrom
foxtail463:improvement/selected-partition-row-count

Conversation

@foxtail463
Copy link
Copy Markdown
Contributor

Problem Summary:

Scan partition pruning state was not consistently represented across OLAP and external file scans. As a result, CBO row-count estimation needed scattered special handling and could not reliably use the selected partitions produced by pruning.

In addition, partition predicates that had already been applied during partition pruning could still be applied again by filter estimation, causing duplicated selectivity estimation. HMS selected-partition row-count estimation also had unclear handling for valid zero-row results versus unknown row count.

Solution:

Introduce a unified partition selection state carried by scan plans, including selected partitions, prune status, manual partition constraints, and partition conjuncts already applied to row-count estimation.

Use selected partitions to estimate scan row count for both OLAP and external file scans. Preserve applied partition conjuncts in Statistics so FilterEstimation can skip predicates that have already affected row count. Also clarify HMS selected-partition row-count handling so zero-row results are treated as valid instead of unknown.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow
Copy link
Copy Markdown
Contributor

/review

@foxtail463 foxtail463 force-pushed the improvement/selected-partition-row-count branch from ad131d6 to 6b3af18 Compare June 3, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants