lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Unicode integration
Date Wed, 23 Nov 2011 06:37:04 GMT
On Wed, Nov 16, 2011 at 11:24:22PM +0100, Nick Wellnhofer wrote:
> If we go with utf8proc, I would propose a new analyzer  
> Lucy::Analysis::Normalizer with the following interface:
>
> my $normalizer = Lucy::Analysis::Normalizer->new(
>     normalization_form => $string,
>     case_fold          => $bool,
>     strip_accents      => $bool,
> );

For the benefit of those who are not subscribed to the lucy-issues list[1], I
wanted to pass along that Nick has followed through with a full, portable C
implementation of Lucy::Analysis::Normalizer, with proper documentation,
tests... the whole nine yards.

    https://issues.apache.org/jira/browse/LUCY-191

Things could hardly have gone better or more according to the "Apache Way".
Nick did not let himself be held back by either the redaction of the Analyzer
subclassing API or the dependency constraints he was asked to work within, he
diverted the discussion from the user list to the dev list at the appropriate
moment, proposed an interface and the basic shape of an implementation, built
consensus for his proposal, then coded up his contribution with hardly any
help and delivered a solid patch.

And then as an encore, yesterday Nick submitted a patch to solve our current
Highlighter bug.

Bravo, Nick!

Marvin Humphrey

[1] The lucy-issues list gets notifications from our JIRA issue tracker.
    Significant design decisions must always be undertaken on the dev list, so
    conversations in the issue tracker are limited to implementation
    discussions. <http://incubator.apache.org/lucy/mailing_lists.html>


Mime
View raw message