lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikola Tanković <nikola.tanko...@gmail.com>
Subject Re: GSoC: LUCENE-2308: Separately specify a field's type
Date Fri, 13 May 2011 17:35:25 GMT
2011/5/12 Michael McCandless <lucene@mikemccandless.com>

> 2011/5/9 Nikola Tanković <nikola.tankovic@gmail.com>:
>
> >> > Introduction of an FieldType class that will hold all the extra
> >> > properties
> >> > now stored inside Field instance other than field value itself.
> >>
> >> Seems like this is an easy first baby step -- leave current Field
> >> class, but break out the "type" details into a separate class that can
> >> be shared across Field instances.
> >
> > Yes, I agree, this could be a good first step. Mike submitted a patch on
> > issue #2308. I think it's a solid base for this.
>
> Make that Chris.
>

Ouch, sorry!


>
> >> > New FieldTypeAttribute interface will be added to handle extension
> with
> >> > new
> >> > field properties inspired by IndexWriterConfig.
> >>
> >> How would this work?  What's an example compelling usage?  An app
> >> could use this for extensibility, and then make a matching codec that
> >> picks up this attr?  EG, say, maybe for marking that a field is a
> >> "primary key field" and then codec could optimize accordingly...?
> >
> > Well that could be very interesting scenario. It didn't rang a bell to me
> > for possible codec usage, but it seems very reasonable. Attributes
> otherwise
> > don't make much sense, unless propertly used in custom codecs.
> >
> > How will we ensure attribute and codec compatibility?
>
> I'm just thinking we should have concrete reasons in mind for cutting
> over to attributes here... I'd rather see a fixed, well thought out
> concrete FieldType hierarchy first...
>

Yes, I couldn't agree more, and I also think Chris has some great ideas on
this field, given his work on Spatial indexing which tends to have use of
this additional attributes.


>
> >> > Refactoring and dividing of settings for term frequency and
> positioning
> >> > can
> >> > also be done (LUCENE-2048)
> >>
> >> Ahh great!  So we can omit-positions-but-not-TF.
> >>
> >> > Discuss possible effects of completion of LUCENE-2310 on this project
> >>
> >> This one is badly needed... but we should keep your project focused.
> >
> >
> > We'll tackle this one afterwards.
>
> Good.
>
> >> > Adequate Factory class for easier configuration of new Field instances
> >> > together with manually added new FieldTypeAttributes
> >> > FieldType, once instantiated is read-only. Only fields value can be
> >> > changed.
> >>
> >> OK.
> >>
> >> > Simple hierarchy of Field classes with core properties logically
> >> > predefaulted. E.g.:
> >> >
> >> > NumberField,
> >>
> >> Can't this just be our existing NumericField?
> >
> > Yes, this is classic NumericField with changes proposed in LUCENE-2310.
> Tim
> > Smith mentioned that Fieldable class should be kept for custom
> > implementations to reduce number of setters (for defaults).
> > Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
> > should be implemented instead of Fieldable for custom implementations, so
> > both Fieldable and AbstractField are not needed anymore.
> > In my opinion Field shoud become abstract extended with others.
> > Another proposal: how about keeping only Field (with no hierarchy) and
> move
> > hierarchy to FieldType, such as NumericFieldType, StringFieldType since
> this
> > hierarchy concerns type information only?
>
> I think hierarchy of both types and the "value containers" that hold
> the corresponding values could make sense?
>

Hmm, I think we should get more opinions on this one also.


>
> > e.g. Usage:
> > FieldType number = new NumericFieldType();
> > Field price = new Field();
> > price.setType(number);
> > // but this is much cleaner...
> > Field price = new NumericField();
> > so maybe whe should have paraller XYZField with XYZFieldType...
> > Am I complicating?
> >>
> >> > StringField,
> >>
> >> This would be like NOT_ANALYZED?
> >
> > Yes, strings are often one word only. Or maybe we can name it NameField,
> > NonAnalyzedField or something.
>
> StringField sounds good actually...
>
> >> > TextField,
> >>
> >> This would be ANALYZED?
> >
> > Yes.
> >
>
> OK.
>
> >> > What is the best way to break this into small baby steps?
> >>
> >> Hopefully this becomes clearer as we iterate.
> >
> > Well, we know the first step: moving type details into FieldType class.
>
> Yes!
>
> Somehow tying into this as well is a stronger decoupling of the
> indexer from analysis/document.  Ie, what indexer needs of a document
> is very minimal -- just an iterable over indexed & stored values.
> Separately we can still provide a "full featured" Document class w/
> add, get, remove, etc., but that's "outside" of the indexer.
>

I'll get back to this one after additional research. Maybe we should do
couple of more interactions, then I'll summarize the conclusions.


>
> Mike
>
> http://blog.mikemccandless.com


Nikola

Mime
View raw message