lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawn Zoƫ Raison <d...@digitorial.co.uk>
Subject Re: Analysers for newspaper pages...
Date Mon, 28 Nov 2011 20:31:48 GMT
Hi Steve,

On 28/11/2011 19:43, Steven A Rowe wrote:
> I assume that when you refer to "the impact of stop words," you're concerned about query-time
performance?  You should consider the possibility that performance without removing stop words
is good enough that you won't have to take any steps to address the issue.
Not to fussed about query-time performance; certainly no-one has 
complained so far. It's more the sheer number of junk pages we get 
searching on phrases that contain stop words - it can lead to hundreds 
of thousands of results, and the pedants among our userbase insist on 
paging through the lot :-|

I'd much rather contain the stop words using a *gram based approach and 
offer a less populous but more accurate resultset.

>
> That said, there are two filters in Solr 3.X[1] that would do the equivalent of what
you have outlined: CommonGramsFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsFilter.html>
 and CommonGramsQueryFilter<http://lucene.apache.org/solr/api/org/apache/solr/analysis/CommonGramsQueryFilter.html>.
We use lucene directly, but I'll take a look - Thanks.

> You can use these filters with a Lucene 3.X application by including the (same-versioned)
solr-core jar as a dependency.
>
> Steve

-- 

Rgds.
*Dawn Raison*


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message