skip empty BooleanQuery with Occur=SHOULD#4475
Conversation
| Query query = clause.getKey().getQuery(); | ||
| Occur occur = clause.getValue(); | ||
|
|
||
| if (occur == Occur.SHOULD && query instanceof BooleanQuery boolQ && boolQ.clauses().isEmpty()) { |
There was a problem hiding this comment.
this string of conditions looks very hacky. Reading the description... I thought there was going to be a null somewhere to check for?
There was a problem hiding this comment.
I can only work with what the LuceneQParser returns, and in this case it is a BooleanQuery with an empty List of clauses. And as I understand it, I can only safely drop a Query with an empty list of clauses if it is a SHOULD. I do not see any attribute on BooleanQuery I could check for null.
There was a problem hiding this comment.
The description says:
org.apache.solr.search.join.FiltersQParser#parseImpl can return empty-queries
If you dig deeper... this isn't a fault/problem of FiltersQParser specifically or that method, it's that broadly, any QParser parse() may return either null or return an empty BooleanQuery. It'd be great if you pursue fixing that. But I don't think FiltersQParser needs a modification.
There was a problem hiding this comment.
I see, the root cause is that org.apache.solr.parser.SolrQueryParserBase#newFieldQuery calls org.apache.lucene.util.QueryBuilder#createFieldQuery, and this method returns null
if the queryText is entirely remove due to only containing a stop-word. This null is then propagated upwards.
What would be the alternative, returning a MatchNoDocsQuery instead of null?
There was a problem hiding this comment.
Ahh. IMO it's make sense to put logic in org.apache.solr.schema.TextField#getFieldQuery which is on the call chain you reference. Logic to prevent null or an empty boolean query -- either should simply return MatchNoDocsQuery. Ideally the scope would be broader than this getFieldQuery method which is somewhat obscure and thus communicating this change alone would be awkward -- a clue that the scope isn't right. It'd be more useful & clearer to say any analysis yielding no tokens results in a MatchNoDocsQuery, if we can make such a claim. Of course, feel free to look further and propose something.
There was a problem hiding this comment.
I don't think org.apache.solr.schema.TextField#getFieldQuery is called, the call stack to arrive at org.apache.lucene.util.QueryBuilder#createFieldQuery is:
at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:257)
at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:529)
at org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:68)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1122)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:861)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:454)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:293)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:173)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:143)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:276)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:51)
at org.apache.solr.search.QParser.getQuery(QParser.java:197)
at org.apache.solr.search.join.FiltersQParser.parseImpl(FiltersQParser.java:64)
at org.apache.solr.search.BoolQParserPlugin$1.parseImpl(BoolQParserPlugin.java:52)
at org.apache.solr.search.BoolQParserPlugin$1.parse(BoolQParserPlugin.java:47)
at org.apache.solr.search.QParser.getQuery(QParser.java:197)
at org.apache.solr.search.TestMmBoolQParserPlugin.parseQuery(TestMmBoolQParserPlugin.java:49)
at org.apache.solr.search.TestMmBoolQParserPlugin.testMinShouldMatchThresholdsLower(TestMmBoolQParserPlugin.java:103)
From what you are saying, it wouldn't be sufficient to just check if org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:257) returns null and then return MatchNoDocsQuery instead of passing null up the stack?
There was a problem hiding this comment.
yes; try it any ways. We'll see. It'll be interesting to see if there are test failures that make assumptions.
Seems SolrQueryParserBase.newFieldQuery is a good spot for this change. Or override Lucene level createFieldQuery as it's the lowest spot we can do and it should cover a wider net.
| } | ||
|
|
||
| @Test | ||
| public void testMinShouldMatchWithEmptyClauseCausedByStopWord() throws Exception { |
There was a problem hiding this comment.
test fails: undefined field name_sw
@dsmiley
This is a follow up of #4406
I realized that the minimum match expression like mm=-1 does not work properly if one should-clause is entirely removed by the analyzer, e.g. it only contains stopwords.
The root cause is that org.apache.solr.search.join.FiltersQParser#parseImpl can return empty-queries.
Given you comment on that method added quite recently, I am not sure if this is the right approach, but this change would fix the root of the issue. Alternatively I could just filter them out in the BoolQParserPlugin.
Do you know if such the empty clauses should only be filtered out if mmAutoRelax==true? If that is the case I need to do it in BoolQParserPlugin anyway.
Thanks