lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Lucene query with long strings
Date Wed, 24 Mar 2010 17:44:56 GMT

On Mar 24, 2010, at 9:20 AM, Shashi Kant wrote:

> Add the common terms such as "University", "School", "Medicine",
> "Institute" etc. to stopwords list, so you are left with Stanford,
> "Palo Alto" etc.

I don't know if I would remove them, but you might consider using the CommonGram or n-gram
approach whereby you associate these "stop words" with the words around them.

> Then use Ahmet's suggestion of using a booleanquery
> .setMinimumNumberShouldMatch() to (say) 75% of the query string
> length.
> Finally, if you wish to be very precise, you can loop through the hits
> collector and use a string comparison algorithm like Jaro-Winkler,
> Levenstein etc. for a second-level filter.

Note, this approach will be slow.

Grant Ingersoll

Search the Lucene ecosystem using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message