2011/5/12 Michael McCandless <lucene@mikemccandless.com>
2011/5/9 Nikola Tanković <nikola.tankovic@gmail.com>:

>> > Introduction of an FieldType class that will hold all the extra
>> > properties
>> > now stored inside Field instance other than field value itself.
>>
>> Seems like this is an easy first baby step -- leave current Field
>> class, but break out the "type" details into a separate class that can
>> be shared across Field instances.
>
> Yes, I agree, this could be a good first step. Mike submitted a patch on
> issue #2308. I think it's a solid base for this.

Make that Chris.

Ouch, sorry!
 

>> > New FieldTypeAttribute interface will be added to handle extension with
>> > new
>> > field properties inspired by IndexWriterConfig.
>>
>> How would this work?  What's an example compelling usage?  An app
>> could use this for extensibility, and then make a matching codec that
>> picks up this attr?  EG, say, maybe for marking that a field is a
>> "primary key field" and then codec could optimize accordingly...?
>
> Well that could be very interesting scenario. It didn't rang a bell to me
> for possible codec usage, but it seems very reasonable. Attributes otherwise
> don't make much sense, unless propertly used in custom codecs.
>
> How will we ensure attribute and codec compatibility?

I'm just thinking we should have concrete reasons in mind for cutting
over to attributes here... I'd rather see a fixed, well thought out
concrete FieldType hierarchy first...

Yes, I couldn't agree more, and I also think Chris has some great ideas on this field, given his work on Spatial indexing which tends to have use of this additional attributes.
 

>> > Refactoring and dividing of settings for term frequency and positioning
>> > can
>> > also be done (LUCENE-2048)
>>
>> Ahh great!  So we can omit-positions-but-not-TF.
>>
>> > Discuss possible effects of completion of LUCENE-2310 on this project
>>
>> This one is badly needed... but we should keep your project focused.
>
>
> We'll tackle this one afterwards.

Good.

>> > Adequate Factory class for easier configuration of new Field instances
>> > together with manually added new FieldTypeAttributes
>> > FieldType, once instantiated is read-only. Only fields value can be
>> > changed.
>>
>> OK.
>>
>> > Simple hierarchy of Field classes with core properties logically
>> > predefaulted. E.g.:
>> >
>> > NumberField,
>>
>> Can't this just be our existing NumericField?
>
> Yes, this is classic NumericField with changes proposed in LUCENE-2310. Tim
> Smith mentioned that Fieldable class should be kept for custom
> implementations to reduce number of setters (for defaults).
> Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
> should be implemented instead of Fieldable for custom implementations, so
> both Fieldable and AbstractField are not needed anymore.
> In my opinion Field shoud become abstract extended with others.
> Another proposal: how about keeping only Field (with no hierarchy) and move
> hierarchy to FieldType, such as NumericFieldType, StringFieldType since this
> hierarchy concerns type information only?

I think hierarchy of both types and the "value containers" that hold
the corresponding values could make sense?

Hmm, I think we should get more opinions on this one also.
 

> e.g. Usage:
> FieldType number = new NumericFieldType();
> Field price = new Field();
> price.setType(number);
> // but this is much cleaner...
> Field price = new NumericField();
> so maybe whe should have paraller XYZField with XYZFieldType...
> Am I complicating?
>>
>> > StringField,
>>
>> This would be like NOT_ANALYZED?
>
> Yes, strings are often one word only. Or maybe we can name it NameField,
> NonAnalyzedField or something.

StringField sounds good actually...

>> > TextField,
>>
>> This would be ANALYZED?
>
> Yes.
>

OK.

>> > What is the best way to break this into small baby steps?
>>
>> Hopefully this becomes clearer as we iterate.
>
> Well, we know the first step: moving type details into FieldType class.

Yes!

Somehow tying into this as well is a stronger decoupling of the
indexer from analysis/document.  Ie, what indexer needs of a document
is very minimal -- just an iterable over indexed & stored values.
Separately we can still provide a "full featured" Document class w/
add, get, remove, etc., but that's "outside" of the indexer.

I'll get back to this one after additional research. Maybe we should do couple of more interactions, then I'll summarize the conclusions.
 

Mike

http://blog.mikemccandless.com

Nikola