lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Chan <PC...@Biz360.com>
Subject RE: GoogleQueryParser
Date Fri, 13 Sep 2002 00:03:01 GMT
I mentioned that it was a bug because it was not consistent with how
queryParser was handling queries in 1.2.
in 1.2, "a AND b OR c" means "+a +b c", "c OR a AND b" means "c +a +b", 

while in this case, searching for "a b OR c" is not the same as searching
for "a AND b OR c", even if I do a setDefaultOperator(AND) first, 
but one would expect them to mean the same thing, because whenever an
operator is not specified, it should be defaulted to AND.

basically, output#1 and output#2 are different from the code below while I
expect them to be the same

QueryParser		qp = new QueryParser("field", new
org.apache.lucene.analysis.SimpleAnalyzer());
Query 		q = qp.parse("a AND b OR c");
System.out.println(q.toString("field"));		// output#1

qp.setDefaultOperator(QueryParser.DEFAULT_OPERATOR_AND)
q = qp.parse("a b OR c");
System.out.println(q.toString("field"));		// output#2



Philip

-----Original Message-----
From: Halácsy Péter [mailto:halacsy.peter@axelero.com]
Sent: Wednesday, September 11, 2002 11:58 PM
To: Lucene Users List
Subject: RE: GoogleQueryParser




>-----Original Message-----
>From: Philip Chan [mailto:PChan@Biz360.com]
>Sent: Wednesday, September 11, 2002 11:04 PM
>To: Lucene Users List
>Subject: RE: GoogleQueryParser
>
>
>I think there's a bug, if I set the default operator to be OR, 
>when I run
>
>java org.apache.lucene.queryParser.QueryParser "a AND b OR c"
>
>it will give me the result of "+a +b c"
>if I set the default operator to be AND, and run it with the 
>term "a b OR
>c", it will give me "+a b c", which is different

To be exact it's not a bug, it's feature ;) Well, the structured query
language of Lucene (and Google and others) is not a strict boolean language.
For example I think the QueryParser of Lucene do not support parenthesis: a
AND (b OR C) 

Instead of strict boolean logic it supports constraint on query terms: a
query term is either required or optional or prohibited. If you write + sign
before the term it will be required. If you write - it will be prohibited.
The question is: is a term required or optional if you do not specify
anything? 

DEFAULT_OPERATOR_OR (default QueryParser):
A B C --> all three terms are optional

DEFAULT_OPERATOR_AND (Google style):
A B C --> +A +B +C all three terms are required.

Because a b OR c query is not a strict boolean query, the query parser can
choose how to translate it. +a +b c not too good since doesn't equal to the
result of input query c OR a b

peter






>
>-----Original Message-----
>From: Halácsy Péter [mailto:halacsy.peter@axelero.com]
>Sent: Wednesday, September 11, 2002 4:49 AM
>To: Lucene Users List; Clemens Marschner
>Subject: RE: GoogleQueryParser
>
>
>
>
>>-----Original Message-----
>>From: Eric Jain [mailto:Eric.Jain@isb-sib.ch]
>>Sent: Wednesday, September 11, 2002 1:44 PM
>>To: Clemens Marschner
>>Cc: Lucene Users List
>>Subject: Re: GoogleQueryParser
>>
>>
>>> queryParser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
>>
>>Thanks, that would be exactely what I need. Must be a new 
>>method, not yet in
>>the public release?
>>
>check out the new QueryParser from the cvs
>
>peter
>
>--
>To unsubscribe, e-mail:
><mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail:
><mailto:lucene-user-help@jakarta.apache.org>
>
>--
>To unsubscribe, e-mail:   
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message