lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: easy one? IN and OR stopword help
Date Thu, 07 Jun 2012 23:54:51 GMT
It depends on whether the query parser is smart enough to optimize away 
empty boolean terms. Otherwise, the semantics of "x AND y" (or BooleanQuery 
with two "MUST" clauses) is the intersection of the documents selected by 
matching x and the documents selected by matching y. If y selects no 
documents, the intersection will be empty. Analysis is a separate semantic 
step from syntactic parsing, so if y is a stopword or a quoted phrase 
containing only a stopword, it parses fine, but a dumb query parser might 
generate a TermQuery with an empty term, which will match no documents.

Or, if stopwords are disabled at query time, but were enabled at index time, 
the TermQuery would refer to a term that cannot be found in the index.

-- Jack Krupansky

-----Original Message----- 
From: Trejkaz
Sent: Thursday, June 07, 2012 5:44 PM
To: java-user@lucene.apache.org
Subject: Re: easy one? IN and OR stopword help

On Fri, Jun 8, 2012 at 5:35 AM, Jack Krupansky <jack@basetechnology.com> 
wrote:
> Well, if you have defined OR/or and IN/in as stopwords, what is it you 
> expect other than for the analyzer to ignore those terms (which with a 
> boolean “AND” means match nothing)?

Is this behaviour really logical?

If I search for a single phrase like "Jack and Jill", and "and" is a
stop word, it becomes "Jack - Jill", right? And then matches documents
which have Jack and Jill next to each other (although I'm not 100%
sure on whether term positions mess it up for this specific case as I
can't remember whether the term position increments on a stop word or
not. It's irrelevant for the next step in my logic anyway.)

If I search for a single term like "and" and "and" is a stop word, the
equivalent behaviour should be to search for [] (the empty term set),
and every item matches the empty term set, so {X} AND "and" should
return the same as {X} for any query {X}, I would have thought.

Is this some peculiarity with boolean query or query parser implementation?

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message