incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wellnhofer <wellnho...@aevum.de>
Subject Re: [lucy-dev] Promoting new analysis components
Date Fri, 10 Feb 2012 13:05:26 GMT
On 09/02/2012 02:49, Marvin Humphrey wrote:
> On Wed, Feb 08, 2012 at 05:04:56PM +0100, Nick Wellnhofer wrote:
>> On 23/12/2011 04:18, Marvin Humphrey wrote:
>>>     * Implement CaseFolder as a subclass of Normalizer.
>>
>> This has yet to be done. We could also mark the CaseFolder as deprecated
>> and remove it completely later.
>
> The cost for keeping CaseFolder around in its current form is high, because it
> is tied into a perlapi function and thus needs a per-host implementation. (The
> perlapi function's name broke in late Perl 5.15 releases, which was a PITA to
> troubleshoot).  In contrast, the cost for keeping CaseFolder around is small
> if it becomes a subclass of Normalizer.
>
> However, CaseFolder and Normalizer presumably have slightly different case
> mappings, thus the subclassing change is a back compat break.  It shouldn't be
> a horrible break (depending on how close the mappings are) because it will
> only affect search-time, screwing up the results only for terms which contain
> code points whose mapping has changed.
>
> I don't think we should outright remove CaseFolder without a really good
> reason, because that will force almost all of our users to change their code
> and then reindex from scratch.  But a subtle compat break might be OK,
> especially since you can update all the docs in place after upgrading and only
> suffer during a window of time from slightly degraded search results.

The original plan was to implement CaseFolder as a subclass of 
Normalizer, but I think that doesn't play well with the Dump/Load 
functions. Composition might be a better approach.

Nick

Mime
View raw message