lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@broadvision.com>
Subject RE: Phrase Query Problem
Date Tue, 18 Dec 2007 18:10:40 GMT
Hi,

In case you donot want to toss away any stop words and even
preserve case, WhiteSpaceAnalyzer can be used, also using
WhiteSpaceTokenizer would serve as a test (but need to reindex 
whole data set first), to make sure there is no other problems.

Best regards, Lisheng



-----Original Message-----
From: mark harwood [mailto:markharw00d@yahoo.co.uk]
Sent: Tuesday, December 18, 2007 9:42 AM
To: java-user@lucene.apache.org
Subject: Re: Phrase Query Problem


You could write a custom analyzer that drops stopwords but adds an extra 1
to the "positionIncrement" property for the next valid Token after each
omiited stop word.

This would retain the benefit of removing stopwords from your index and yet
prevent your example phrases matching (because the remaining words are not
recorded as being directly next to each other)

Cheers
Mark


----- Original Message ----
From: Sirish Vadala <vsirishreddy@yahoo.co.in>
To: java-user@lucene.apache.org
Sent: Tuesday, 18 December, 2007 5:10:19 PM
Subject: RE: Phrase Query Problem


Yes... If my query phrase is "Health Safety", docs with "Health and
 Safety",
"Health or Safety" are being returned...

So... Is there any other way to handle this situation... Especially in
 the
above mentioned case, the user is expecting around 5 records and the
 query
is fetching more than 550 records.8-O

Thanks.


Zhang, Lisheng wrote:
> 
> Hi,
> 
> Do you mean that your query phrase is "Health Safety",
> but docs with "Health and Safety" returned?
> 
> If that is the case, the reason is that StandardAnalyzer
> filters out "and" (also "or, "in" and others) as stop 
> words during indexing, and the QueryParser filters those
> words out also.
> 
> Best regards, Lisheng
> 
> -----Original Message-----
> From: Sirish Vadala [mailto:vsirishreddy@yahoo.co.in]
> Sent: Monday, December 17, 2007 9:49 AM
> To: java-user@lucene.apache.org
> Subject: Phrase Query Problem
> 
> 
> 
> I have the following code for search:
> 
> BooleanQuery bQuery = new BooleanQuery();
> Query queryAuthor;
> queryAuthor = new TermQuery(new Term(IFIELD_LEAD_AUTHOR,
> author.trim().toLowerCase()));
> bQuery.add(queryAuthor, BooleanClause.Occur.MUST);
> ....................................................................
> ....................................................................
> 
> PhraseQuery pQuery = new PhraseQuery();
> String[] phrase = txtWithPhrase.toLowerCase().split(" ");
> for (int i = 0; i < phrase.length; i++) {
>     pQuery.add(new Term(IFIELD_TEXT, phrase[i]));
> }
> pQuery.setSlop(0);
> bQuery.add(pQuery, BooleanClause.Occur.MUST);
> ....................................................................
> ....................................................................
> 
> String[] sortOrder = {IFIELD_LEAD_AUTHOR, IFIELD_TEXT};
> Sort sort = new Sort(sortOrder);
> hits = indexSearcher.search(bQuery, sort);
> 
> Now My problem here is: If I do a search on a phrase with text Health
> Safety, it is fetching me all the records where in the text is Health
> and/or/in Safety. It is fetching me these records even after setting
 the
> slop of the phrase query to zero for exact match. I am using standard
> analyzer while indexing my records.
> 
> Any help on this is greatly appreciated. 
> 
> Sirish Vadala
> -- 
> View this message in context:
> http://www.nabble.com/Phrase-Query-Problem-tp14373945p14373945.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context:
 http://www.nabble.com/Phrase-Query-Problem-tp14373945p14401354.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org






      __________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message