lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: QueryParser/StopAnalyzer question
Date Mon, 23 May 2011 14:13:13 GMT
Hmmm, somehow I missed this days ago....

Anyway, the Lucene query parsing process isn't quite Boolean logic.
I encourage you to think in terms of "required", "optional", and
"prohibited".

Both queries are equivalent, to see this try attaching &debugQuery=on
to your URL and look at the "parsed query" in the debug info....

Anyway, to your qestion.
+foo:bar +baz:"there is"

reads that "bar" must appear in the field "foo". So far so good.
But it's also required that baz contain the empty clause, which
is different than saying baz must be empty. One can argue that
any field contains, by definition, nothing.

But imagine the impact of what you're requesting. If all stop words
get removed, then no query would ever match yours. Which
would be very counter-intuitive IMO. Your users have no clue
that you've removed stopwords, so they'll sit there saying "Look, I
KNOW that "bar" was in foo and I KNOW that "there is" was in
baz, why the heck didn't this cursed system find my doc?

Anyway, I don't think you really want this behavior in the
stopword removal case. If you can post some use-cases where this
would be desirable, maybe we can noodle about a solution....

Best
Erick


2011/5/23 Mindaugas Žakšauskas <mindas@gmail.com>:
> Not much luck so far :(
>
> Just in case if anyone wants to earn some virtual dosh, I have added
> some 50 bonus points to this question on StackOverflow:
>
> http://stackoverflow.com/questions/6H044061/lucene-query-parsing-behaviour-joining-query-parts-with-and
>
> I also promise to post a solution here if anything satisfactory turns up.
>
> m.
>
> 2011/5/17 Mindaugas Žakšauskas <mindas@gmail.com>:
>> Hi,
>>
>> Let's say we have an index having few documents indexed using
>> StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries:
>> 1) foo:bar
>> 2) baz:"there is"
>>
>> Let's assume that the first query yields some results because there
>> are documents matching that query.
>>
>> The second query contains two stopwords ("there" and "is") and yields
>> 0 results. The reason for this is because when baz:"there is" is
>> parsed, it ends up as a void query as both "there" and "is" are
>> stopwords (technically speaking, this is converted to an empty
>> BooleanQuery having no clauses). So far so good.
>>
>> However, any of the following combined queries
>>
>> +foo:bar +baz:"there is"
>> foo:bar AND baz:"there is"
>>
>> behave exactly the same way as query +foo:bar, that is, brings back
>> some results. The second AND part which is supposed to yield no
>> results is completely ignored.
>>
>> One might argue that when ANDing both conditions have to be met, that
>> is, documents having foo=bar and baz being empty have to be retrieved,
>> as when issued seperately, baz:"there is" yields 0 results.
>>
>> It seem contradictory as an atomic query component has different
>> impact on the overall query depending on the context. Is there any
>> logical explanation for this? Can this be addressed in any way,
>> preferably without writing own QueryAnalyzer?
>>
>> If this makes any difference, observed behaviour happens under Lucene v3.0.2.
>>
>> Regards,
>> Mindaugas
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message