lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-dev] Promoting new analysis components
Date Thu, 09 Feb 2012 09:39:28 GMT
On 09/02/2012 02:49, Marvin Humphrey wrote:
> After reviewing the Lucy::Simple code, I realized that we can avoid breaking
> compat with only a few extra lines.
>
>    * If the index exists during new(), extract the schema and type from what's
>      on disk.
>    * Otherwise, create a new EasyAnalyzer for the type.
>
> That way, we avoid a schema conflict crash when indexes built by Lucy::Simple
> prior to 0.4.0 are read by 0.4.0 or above.

That's a good idea.

> However, CaseFolder and Normalizer presumably have slightly different case
> mappings, thus the subclassing change is a back compat break.  It shouldn't be
> a horrible break (depending on how close the mappings are) because it will
> only affect search-time, screwing up the results only for terms which contain
> code points whose mapping has changed.

The German sharp s ("ß") is handled differently by the CaseFolder and 
the Normalizer. The CaseFolder leaves it untouched, whereas the 
Normalizer converts it to "ss". Fortunately, the snowball stemmer also 
converts sharp s to "ss", so many users should be fine.

> I don't think we should outright remove CaseFolder without a really good
> reason, because that will force almost all of our users to change their code
> and then reindex from scratch.  But a subtle compat break might be OK,
> especially since you can update all the docs in place after upgrading and only
> suffer during a window of time from slightly degraded search results.

+1

Nick

Mime
View raw message