lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-dev] Promoting new analysis components
Date Fri, 10 Feb 2012 18:41:21 GMT
On Fri, Feb 10, 2012 at 02:05:26PM +0100, Nick Wellnhofer wrote:
> The original plan was to implement CaseFolder as a subclass of  
> Normalizer, but I think that doesn't play well with the Dump/Load  
> functions. Composition might be a better approach.

Composition is a fine approach as well, so +1 if that's your preference.

As an academic exercise, though, I'd like to explore how Dump/Load might still
work under subclassing.

If the subclassed CaseFolder were a brand new class, it would be fine for it
to inherit Dump/Load from Normalizer.  The class name is part of the dump
data; aside from that, everything else would be the same:

        "_class": "Lucy::Analysis::CaseFolder",    # <--------
        "case_fold": 1,
        "normalization_form": "NFKC",
        "strip_accents": 0

The problem is that there are schema files out there in the wild which contain
serialized CaseFolders with dump data that won't satisfy Normalizer's
implementation of Load():

    # Missing "case_fold", "normalization_form", and "strip_accents".
        "_class": "Lucy::Analysis::CaseFolder"

Therefore, we need to override Load() in the subclassed CaseFolder.  We can't
invoke the super Load() method, but that's OK -- we can go through
CaseFolder_init() to flesh out the object:

    CaseFolder_load(CaseFolder *self, Obj *dump) {
        return CaseFolder_init(self);

CaseFolder_init() will invoke Normalizer_init() in a standard subclass
implementation, doing the work that invoking the superclass's Load() ordinarily
would have done.

We should also override Dump() in the subclass (just keeping the implementation
in the current CaseFolder will do) so that the dump data stays consistent and
this change has no visible impact on schema files.

Marvin Humphrey

View raw message