lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From poeta simbolista <poetasimboli...@gmail.com>
Subject Solution for unwanted ngrams
Date Mon, 26 Oct 2009 14:29:34 GMT

Hi,

Imagine you have a text : 
"Apartment not for sale".
and another
"Sale! Apartment for rent"
Search query: "Apartment for sale". 
The above search query will return the texts above highly scored. I would
like to know how I could tackle the following issue better with Lucene. My
ideas:
 - recognise certain sets "Not for sale" as different from "for sale". That
is, invalidate "for sale" if it comes preceded by "not". How could I do
this?
 - Recognise sale only if preceded by "for", since the second meaning
(bargain vs. something for sale) is tricky.
 - transcript "sale" as "for sale", grouped in the query (produce "-sale
+(for sale)"  ). Wouldn't that query invalidate those with the "sale" term?
How to achieve this with Lucene otherwise? 

Should this be tackled only by preprocessing the data before it makes it to
the index? Ideally I would like to preserve the original text  on the index.

Thanks a lot in advance
 Diego
-- 
View this message in context: http://www.nabble.com/Solution-for-unwanted-ngrams-tp26060874p26060874.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message