lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Analyzer thread safety; Stop words
Date Fri, 24 Nov 2006 15:27:29 GMT
On 11/24/06, Antony Bowesman <adb@teamware.com> wrote:
> Two points about Analyzers:
>
> Does anyone have any experience with thread safety of Analyzer implementations.
>   Apart from PerFieldAnalyzerWrapper, the analyzers seem to be thread safe, but
> is there a requirement that analyzers should be thread safe?

Yes, and they normally are thread safe as they create new Tokenizers
and TokenFilters for each field value analyzed.

> Secondly, has anyone thought that it would be a good idea to extend the Analyzer
> interface (Abstract class) to allow a standard way to set stop words?  There
> seem to be two 'families' of stop word configuration via constructors.

That belongs at the TokenFilter level (where it currently is).

> The Set, File and String[] in Analyzers, such as StandardAnalyzer, StopAnalyzer
> where the and then the Russian/Greek variants that do not have the same
> Constructor signature to configure stopwords.
>
> It makes it messy to make analyzers pluggable in a generic way so that stopwords
> can be configurable for any plugged analyzer.

Things currently are pluggable: one makes new Analyzers by plugging
together a Tokenizer followed by several TokeFilters.

If you are talking about some sort of external configuration, take a
look at Solr.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message