lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Analysers for newspaper pages...
Date Mon, 28 Nov 2011 20:51:13 GMT
You can easily use just the CommonGrams stuff from Solr in your pure
lucene project.

There are a couple of useful docs on stop words and common grams et al at


On Mon, Nov 28, 2011 at 8:31 PM, Dawn Zoë Raison <> wrote:
> Hi Steve,
> On 28/11/2011 19:43, Steven A Rowe wrote:
>> I assume that when you refer to "the impact of stop words," you're
>> concerned about query-time performance?  You should consider the possibility
>> that performance without removing stop words is good enough that you won't
>> have to take any steps to address the issue.
> Not to fussed about query-time performance; certainly no-one has complained
> so far. It's more the sheer number of junk pages we get searching on phrases
> that contain stop words - it can lead to hundreds of thousands of results,
> and the pedants among our userbase insist on paging through the lot :-|
> I'd much rather contain the stop words using a *gram based approach and
> offer a less populous but more accurate resultset.
>> That said, there are two filters in Solr 3.X[1] that would do the
>> equivalent of what you have outlined:
>> CommonGramsFilter<>
>>  and
>> CommonGramsQueryFilter<>.
> We use lucene directly, but I'll take a look - Thanks.
>> You can use these filters with a Lucene 3.X application by including the
>> (same-versioned) solr-core jar as a dependency.
>> Steve
> --
> Rgds.
> *Dawn Raison*

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message