lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-2510) migrate solr analysis factories to analyzers module
Date Mon, 07 May 2012 11:59:48 GMT


Robert Muir commented on LUCENE-2510:

re: what is the purpose of the newInstance method?

If you take a look at org.apache.solr.analysis.DelimitedPayloadTokenFilterFactory you'll see
an example of how it's used.

Looking at the implementation in SolrResourceLoader, it seems to facilitate two things:

The use of simplified solr.* package names
In FSTSynonymFilterFactory for example, newInstance is used to load other components. Consequently
bq. bq. SolrResourceLoader adds the instantiated classes to its tracking of SolrCoreAware,
ResourceLoaderAware, bq. etc.
With all that said, its only used in 3 Factories (but a lot of other Solr code). Perhaps we
can break it bq. out somehow.

I think we should revisit this. I don't like placing this into the analyzers module when not
many factories actually use it, instead a lot of unrelated code in solr actually uses it.
I think this could cause a mess.

On the other hand, both the things this provides can be achieved in other ways. For example,
if we use NamedSPILoader instead to allow components such as factories to be found by name,
then we can support "solr.WhitespaceTokenizerFactory" because TokenizerFactory.forName("WhitespaceTokenizerFactory")
works. Using the SPI mechanism would allow for us to have completely pluggable analysis modules,
also operations like listAll() work in case you want to enumerate a list (imagine someone
that doesnt want a xml configuration but configured by a GUI or something like that instead).
We also keep sane packaging within the analysis modules and keep type safety, and solr still
keeps its solr.XXX syntax without reflecting a zillion packages or other crazy things.

> migrate solr analysis factories to analyzers module
> ---------------------------------------------------
>                 Key: LUCENE-2510
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: modules/analysis
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
>         Attachments: LUCENE-2510-parent-classes.patch, LUCENE-2510-parent-classes.patch,
LUCENE-2510-resourceloader-bw.patch, LUCENE-2510.patch, LUCENE-2510.patch, LUCENE-2510.patch
> In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
> This is a good step, but I think the next step is to put the Solr factories into the
analyzers module, too.
> This would make analyzers artifacts plugins to both lucene and solr, with benefits such
> * users could use the old analyzers module with solr, too. This is a good step to use
real library versions instead of Version for backwards compat.
> * analyzers modules such as smartcn and icu, that aren't currently available to solr
users due to large file sizes or dependencies, would be simple optional plugins to solr and
easily available to users that want them.
> Rough sketch in this thread:
> Practically, I havent looked much and don't really have a plan for how this will work
yet, so ideas are very welcome.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message