lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Lucene query with long strings
Date Wed, 24 Mar 2010 17:44:56 GMT

On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote:

> Add the common terms such as "University", "School", "Medicine",
> "Institute" etc. to stopwords list, so you are left with Stanford,
> "Palo Alto" etc.

I don't know if I would remove them, but you might consider using the CommonGram or n-gram
approach whereby you associate these "stop words" with the words around them.

> 
> Then use Ahmet's suggestion of using a booleanquery
> .setMinimumNumberShouldMatch() to (say) 75% of the query string
> length.
> 
> Finally, if you wish to be very precise, you can loop through the hits
> collector and use a string comparison algorithm like Jaro-Winkler,
> Levenstein etc. for a second-level filter.

Note, this approach will be slow.




--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message