lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject [lucy-dev] Promoting new analysis components
Date Fri, 23 Dec 2011 03:18:33 GMT
On Thu, Dec 22, 2011 at 11:31:21PM +0100, Nick Wellnhofer wrote:
> On 22/12/11 06:15, Marvin Humphrey wrote:
>> Can you please create a JIRA issue for this, Nick?  The reason is that our
>> CHANGES file is a list of JIRA issues, and we want people to be able to see
>> that EasyAnalyzer was added in 0.3.0 and get a link to an explanation.
> Done.
>> One thought: We may not want to have EasyAnalyzer inherit from PolyAnalyzer.
>> It's fine, because it works for now... but how about we at least override
>> Dump/Load to store only "_class" and "language"?  If we refactor the
>> Analyzer chain, PolyAnalyzer, since it allows *any* Analyzer to be first --
>> not just a Tokenizer -- may not survive the refactoring in its current
>> form.
> OK, I changed that in commit r1222495.

Looks great, Nick!

Now that EasyAnalyzer is in, I think we should promote the use of all the
improvements Nick has made to the analysis chain.

  * Swap in EasyAnalyzer for PolyAnalyzer, Normalizer for CaseFolder, and
    StandardTokenizer for RegexTokenizer everywhere we can.
  * Deprecate the "language" parameter to PolyAnalyzer#new.

By "deprecate", I mean:

  * Open a JIRA issue so that a suitably titled entry ends up in the CHANGES
  * Mark the "language" param as "deprecated" in the PolyAnalyzer docs.

We don't have a strong deprecation mechanism available to us right now, so I
think that's the best we can do.

Here are some of the files which are going to need documentation changes
because they promote the old Analyzers:


It's not important that any of these changes happen before 0.3.0.  The docs
changes can happen at any time, and the parameter deprecation only allows the
simplification of a single class (PolyAnalyzer itself).  It would also be nice
to switch most test cases to use the new Analyzers, but that can also happen
at any time.

In contrast, here are a couple changes we should *not* make prior to 0.3.0,
because they have index compatibility implications:

  * Change Lucy::Simple to use EasyAnalyzer instead of PolyAnalyzer.
  * Implement CaseFolder as a subclass of Normalizer.

Marvin Humphrey

View raw message