lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sirish Vadala <vsirishre...@yahoo.co.in>
Subject RE: Phrase Query Problem
Date Tue, 18 Dec 2007 18:25:43 GMT

ok, thnx... I will implement using the WhiteSpaceAnalyzer... Let me check the
indexing speed... I mean time taken to index my data set... If that takes
too long then probably I will look into implementing a custom analyzer...


Zhang, Lisheng wrote:
> 
> Hi,
> 
> In case you donot want to toss away any stop words and even
> preserve case, WhiteSpaceAnalyzer can be used, also using
> WhiteSpaceTokenizer would serve as a test (but need to reindex 
> whole data set first), to make sure there is no other problems.
> 
> Best regards, Lisheng
> 
> 
> 
> -----Original Message-----
> From: mark harwood [mailto:markharw00d@yahoo.co.uk]
> Sent: Tuesday, December 18, 2007 9:42 AM
> To: java-user@lucene.apache.org
> Subject: Re: Phrase Query Problem
> 
> 
> You could write a custom analyzer that drops stopwords but adds an extra 1
> to the "positionIncrement" property for the next valid Token after each
> omiited stop word.
> 
> This would retain the benefit of removing stopwords from your index and
> yet
> prevent your example phrases matching (because the remaining words are not
> recorded as being directly next to each other)
> 
> Cheers
> Mark
> 
> 
> ----- Original Message ----
> From: Sirish Vadala <vsirishreddy@yahoo.co.in>
> To: java-user@lucene.apache.org
> Sent: Tuesday, 18 December, 2007 5:10:19 PM
> Subject: RE: Phrase Query Problem
> 
> 
> Yes... If my query phrase is "Health Safety", docs with "Health and
>  Safety",
> "Health or Safety" are being returned...
> 
> So... Is there any other way to handle this situation... Especially in
>  the
> above mentioned case, the user is expecting around 5 records and the
>  query
> is fetching more than 550 records.8-O
> 
> Thanks.
> 
> 
> Zhang, Lisheng wrote:
>> 
>> Hi,
>> 
>> Do you mean that your query phrase is "Health Safety",
>> but docs with "Health and Safety" returned?
>> 
>> If that is the case, the reason is that StandardAnalyzer
>> filters out "and" (also "or, "in" and others) as stop 
>> words during indexing, and the QueryParser filters those
>> words out also.
>> 
>> Best regards, Lisheng
>> 
>> -----Original Message-----
>> From: Sirish Vadala [mailto:vsirishreddy@yahoo.co.in]
>> Sent: Monday, December 17, 2007 9:49 AM
>> To: java-user@lucene.apache.org
>> Subject: Phrase Query Problem
>> 
>> 
>> 
>> I have the following code for search:
>> 
>> BooleanQuery bQuery = new BooleanQuery();
>> Query queryAuthor;
>> queryAuthor = new TermQuery(new Term(IFIELD_LEAD_AUTHOR,
>> author.trim().toLowerCase()));
>> bQuery.add(queryAuthor, BooleanClause.Occur.MUST);
>> ....................................................................
>> ....................................................................
>> 
>> PhraseQuery pQuery = new PhraseQuery();
>> String[] phrase = txtWithPhrase.toLowerCase().split(" ");
>> for (int i = 0; i < phrase.length; i++) {
>>     pQuery.add(new Term(IFIELD_TEXT, phrase[i]));
>> }
>> pQuery.setSlop(0);
>> bQuery.add(pQuery, BooleanClause.Occur.MUST);
>> ....................................................................
>> ....................................................................
>> 
>> String[] sortOrder = {IFIELD_LEAD_AUTHOR, IFIELD_TEXT};
>> Sort sort = new Sort(sortOrder);
>> hits = indexSearcher.search(bQuery, sort);
>> 
>> Now My problem here is: If I do a search on a phrase with text Health
>> Safety, it is fetching me all the records where in the text is Health
>> and/or/in Safety. It is fetching me these records even after setting
>  the
>> slop of the phrase query to zero for exact match. I am using standard
>> analyzer while indexing my records.
>> 
>> Any help on this is greatly appreciated. 
>> 
>> Sirish Vadala
>> -- 
>> View this message in context:
>> http://www.nabble.com/Phrase-Query-Problem-tp14373945p14373945.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
>> 
> 
> -- 
> View this message in context:
>  http://www.nabble.com/Phrase-Query-Problem-tp14373945p14401354.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> 
>       __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Phrase-Query-Problem-tp14373945p14402820.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message