lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: Opposite to StopFilter. Anything already implemented out there?
Date Tue, 22 Jul 2008 14:03:05 GMT
Hi Martin,

On 07/22/2008 at 5:48 AM, mpermar wrote:
> I want to index some incoming text. In this case what I want
> to do is just detect keywords in that text. Therefore I want
> to discard everything that is not in the keywords set. This
> sounds to me pretty much like the reverse of using stop words,
> that is it I want to use a set of "accepted" words.
> 
> So I planned to create a new filter that just checks that
> incoming words are in the "acceptable set" and discards them
> otherwise. Are you aware of any analyzer/filter out there that
> uses this approach? Is there any other better way to do this?

Solr has KeepWordFilter - it sounds exactly like what you want: 

Javadoc: <http://lucene.apache.org/solr/api/org/apache/solr/analysis/KeepWordFilter.html>
Source: <http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/KeepWordFilter.java?view=markup>

Depending on your requirements and the nature of your keywords list, you might consider applying
this filter only to queries, rather than at index time.  That way, the keyword list can change
without having to re-index.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message