lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikola Tanković <nikola.tanko...@gmail.com>
Subject Re: GSoC: LUCENE-2308: Separately specify a field's type
Date Mon, 09 May 2011 19:45:19 GMT
My answers are inline.

2011/4/14 Michael McCandless <lucene@mikemccandless.com>

> 2011/4/13 Nikola Tanković <nikola.tankovic@gmail.com>:
> > Hi all,
> > if everything goes well I'll be delighted to be part of this project this
> > summer together with my assigned mentor Mike. My task will be to
> introduce
> > new classes to Lucene core which will enable to separate Fields' Lucene
> > properties from it's value
> > (https://issues.apache.org/jira/browse/LUCENE-2308).
>
> Welcome Nikola!
>
> > Changes will include:
> >
> > Introduction of an FieldType class that will hold all the extra
> properties
> > now stored inside Field instance other than field value itself.
>
> Seems like this is an easy first baby step -- leave current Field
> class, but break out the "type" details into a separate class that can
> be shared across Field instances.
>

Yes, I agree, this could be a good first step. Mike submitted a patch on
issue #2308. I think it's a solid base for this.


>
> > New FieldTypeAttribute interface will be added to handle extension with
> new
> > field properties inspired by IndexWriterConfig.
>
> How would this work?  What's an example compelling usage?  An app
> could use this for extensibility, and then make a matching codec that
> picks up this attr?  EG, say, maybe for marking that a field is a
> "primary key field" and then codec could optimize accordingly...?
>

Well that could be very interesting scenario. It didn't rang a bell to me
for possible codec usage, but it seems very reasonable. Attributes otherwise
don't make much sense, unless propertly used in custom codecs.

How will we ensure attribute and codec compatibility?

> Refactoring and dividing of settings for term frequency and positioning
> can
> > also be done (LUCENE-2048)
>
> Ahh great!  So we can omit-positions-but-not-TF.
>
> > Discuss possible effects of completion of LUCENE-2310 on this project
>
> This one is badly needed... but we should keep your project focused.
>

We'll tackle this one afterwards.


> > Adequate Factory class for easier configuration of new Field instances
> > together with manually added new FieldTypeAttributes
> > FieldType, once instantiated is read-only. Only fields value can be
> changed.
>
> OK.
>
> > Simple hierarchy of Field classes with core properties logically
> > predefaulted. E.g.:
> >
> > NumberField,
>
> Can't this just be our existing NumericField?
>

Yes, this is classic NumericField with changes proposed in LUCENE-2310. Tim
Smith mentioned that Fieldable class should be kept for custom
implementations to reduce number of setters (for defaults).
Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
should be implemented instead of Fieldable for custom implementations, so
both Fieldable and AbstractField are not needed anymore.
In my opinion Field shoud become abstract extended with others.

Another proposal: how about keeping only Field (with no hierarchy) and move
hierarchy to FieldType, such as NumericFieldType, StringFieldType since this
hierarchy concerns type information only?

e.g. Usage:

FieldType number = new NumericFieldType();
Field price = new Field();
price.setType(number);

// but this is much cleaner...

Field price = new NumericField();

so maybe whe should have paraller XYZField with XYZFieldType...

Am I complicating?


> > StringField,
>
> This would be like NOT_ANALYZED?
>

Yes, strings are often one word only. Or maybe we can name it NameField,
NonAnalyzedField or something.


>
> > TextField,
>
> This would be ANALYZED?
>

Yes.


>
> > NonIndexedField,
>
> This would be only stored?
>
> > My questions and issues:
> >
> > Backward compatibility? Will this go to Lucene 3.0?
>
> Maybe focus on 4.0 for starters and then if there's a nice backport we
> can do that...?
>

OK, that also seems reasonable.


>
> > What is the best way to break this into small baby steps?
>
> Hopefully this becomes clearer as we iterate.
>

Well, we know the first step: moving type details into FieldType class.


>
> Mike
>

Mime
View raw message