lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: Analyzer thread safety; Stop words
Date Fri, 24 Nov 2006 15:27:29 GMT
On 11/24/06, Antony Bowesman <> wrote:
> Two points about Analyzers:
> Does anyone have any experience with thread safety of Analyzer implementations.
>   Apart from PerFieldAnalyzerWrapper, the analyzers seem to be thread safe, but
> is there a requirement that analyzers should be thread safe?

Yes, and they normally are thread safe as they create new Tokenizers
and TokenFilters for each field value analyzed.

> Secondly, has anyone thought that it would be a good idea to extend the Analyzer
> interface (Abstract class) to allow a standard way to set stop words?  There
> seem to be two 'families' of stop word configuration via constructors.

That belongs at the TokenFilter level (where it currently is).

> The Set, File and String[] in Analyzers, such as StandardAnalyzer, StopAnalyzer
> where the and then the Russian/Greek variants that do not have the same
> Constructor signature to configure stopwords.
> It makes it messy to make analyzers pluggable in a generic way so that stopwords
> can be configurable for any plugged analyzer.

Things currently are pluggable: one makes new Analyzers by plugging
together a Tokenizer followed by several TokeFilters.

If you are talking about some sort of external configuration, take a
look at Solr.

-Yonik Solr, the open-source Lucene search server

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message