lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2510) migrate solr analysis factories to analyzers module
Date Wed, 23 Jun 2010 13:35:54 GMT


Robert Muir commented on LUCENE-2510:

Chris, yes I think at a glance this is where I got stuck :)

Related to this, there is duplication in resource loading code already that would be nice
to clean up.

For example, Lucene and Solr have their own separate stopword-loading code etc. But I don't
really like some of the things Lucene's WordListLoader does:
* The lucene WordListLoader builds HashMaps and HashSets but this is wasteful since these
are always then copied to CharArraySet/Maps... Solr's just builds CharArraySet/Map up front.
* the Solr file loading code has some features like trying to guess the size of the set/map
up front for faster loading.
* the Solr stopword loading code is more user-friendly as it ignores BOM markers etc.

I think it would be good to only have one piece of code for this functionality and for it
to be optimal.

> migrate solr analysis factories to analyzers module
> ---------------------------------------------------
>                 Key: LUCENE-2510
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
> In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
> This is a good step, but I think the next step is to put the Solr factories into the
analyzers module, too.
> This would make analyzers artifacts plugins to both lucene and solr, with benefits such
> * users could use the old analyzers module with solr, too. This is a good step to use
real library versions instead of Version for backwards compat.
> * analyzers modules such as smartcn and icu, that aren't currently available to solr
users due to large file sizes or dependencies, would be simple optional plugins to solr and
easily available to users that want them.
> Rough sketch in this thread:
> Practically, I havent looked much and don't really have a plan for how this will work
yet, so ideas are very welcome.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message