Skip to content

skip empty BooleanQuery with Occur=SHOULD#4475

Open
renatoh wants to merge 5 commits into
apache:mainfrom
renatoh:handling_empty_queries_in_FiltersQParser
Open

skip empty BooleanQuery with Occur=SHOULD#4475
renatoh wants to merge 5 commits into
apache:mainfrom
renatoh:handling_empty_queries_in_FiltersQParser

Conversation

@renatoh
Copy link
Copy Markdown
Contributor

@renatoh renatoh commented May 29, 2026

@dsmiley
This is a follow up of #4406

I realized that the minimum match expression like mm=-1 does not work properly if one should-clause is entirely removed by the analyzer, e.g. it only contains stopwords.

The root cause is that org.apache.solr.search.join.FiltersQParser#parseImpl can return empty-queries.
Given you comment on that method added quite recently, I am not sure if this is the right approach, but this change would fix the root of the issue. Alternatively I could just filter them out in the BoolQParserPlugin.

Do you know if such the empty clauses should only be filtered out if mmAutoRelax==true? If that is the case I need to do it in BoolQParserPlugin anyway.

Thanks

Query query = clause.getKey().getQuery();
Occur occur = clause.getValue();

if (occur == Occur.SHOULD && query instanceof BooleanQuery boolQ && boolQ.clauses().isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this string of conditions looks very hacky. Reading the description... I thought there was going to be a null somewhere to check for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only work with what the LuceneQParser returns, and in this case it is a BooleanQuery with an empty List of clauses. And as I understand it, I can only safely drop a Query with an empty list of clauses if it is a SHOULD. I do not see any attribute on BooleanQuery I could check for null.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description says:

org.apache.solr.search.join.FiltersQParser#parseImpl can return empty-queries

If you dig deeper... this isn't a fault/problem of FiltersQParser specifically or that method, it's that broadly, any QParser parse() may return either null or return an empty BooleanQuery. It'd be great if you pursue fixing that. But I don't think FiltersQParser needs a modification.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the root cause is that org.apache.solr.parser.SolrQueryParserBase#newFieldQuery calls org.apache.lucene.util.QueryBuilder#createFieldQuery, and this method returns null
if the queryText is entirely remove due to only containing a stop-word. This null is then propagated upwards.
What would be the alternative, returning a MatchNoDocsQuery instead of null?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. IMO it's make sense to put logic in org.apache.solr.schema.TextField#getFieldQuery which is on the call chain you reference. Logic to prevent null or an empty boolean query -- either should simply return MatchNoDocsQuery. Ideally the scope would be broader than this getFieldQuery method which is somewhat obscure and thus communicating this change alone would be awkward -- a clue that the scope isn't right. It'd be more useful & clearer to say any analysis yielding no tokens results in a MatchNoDocsQuery, if we can make such a claim. Of course, feel free to look further and propose something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think org.apache.solr.schema.TextField#getFieldQuery is called, the call stack to arrive at org.apache.lucene.util.QueryBuilder#createFieldQuery is:

at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:257)
at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:529)
at org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:68)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1122)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:861)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:454)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:293)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:173)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:143)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:276)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:51)
at org.apache.solr.search.QParser.getQuery(QParser.java:197)
at org.apache.solr.search.join.FiltersQParser.parseImpl(FiltersQParser.java:64)
at org.apache.solr.search.BoolQParserPlugin$1.parseImpl(BoolQParserPlugin.java:52)
at org.apache.solr.search.BoolQParserPlugin$1.parse(BoolQParserPlugin.java:47)
at org.apache.solr.search.QParser.getQuery(QParser.java:197)
at org.apache.solr.search.TestMmBoolQParserPlugin.parseQuery(TestMmBoolQParserPlugin.java:49)
at org.apache.solr.search.TestMmBoolQParserPlugin.testMinShouldMatchThresholdsLower(TestMmBoolQParserPlugin.java:103)

From what you are saying, it wouldn't be sufficient to just check if org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:257) returns null and then return MatchNoDocsQuery instead of passing null up the stack?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; try it any ways. We'll see. It'll be interesting to see if there are test failures that make assumptions.
Seems SolrQueryParserBase.newFieldQuery is a good spot for this change. Or override Lucene level createFieldQuery as it's the lowest spot we can do and it should cover a wider net.

}

@Test
public void testMinShouldMatchWithEmptyClauseCausedByStopWord() throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test fails: undefined field name_sw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants