lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: GSoC: LUCENE-2308: Separately specify a field's type
Date Thu, 14 Apr 2011 20:18:56 GMT
2011/4/13 Nikola Tanković <nikola.tankovic@gmail.com>:
> Hi all,
> if everything goes well I'll be delighted to be part of this project this
> summer together with my assigned mentor Mike. My task will be to introduce
> new classes to Lucene core which will enable to separate Fields' Lucene
> properties from it's value
> (https://issues.apache.org/jira/browse/LUCENE-2308).

Welcome Nikola!

> Changes will include:
>
> Introduction of an FieldType class that will hold all the extra properties
> now stored inside Field instance other than field value itself.

Seems like this is an easy first baby step -- leave current Field
class, but break out the "type" details into a separate class that can
be shared across Field instances.

> New FieldTypeAttribute interface will be added to handle extension with new
> field properties inspired by IndexWriterConfig.

How would this work?  What's an example compelling usage?  An app
could use this for extensibility, and then make a matching codec that
picks up this attr?  EG, say, maybe for marking that a field is a
"primary key field" and then codec could optimize accordingly...?

> Refactoring and dividing of settings for term frequency and positioning can
> also be done (LUCENE-2048)

Ahh great!  So we can omit-positions-but-not-TF.

> Discuss possible effects of completion of LUCENE-2310 on this project

This one is badly needed... but we should keep your project focused.

> Adequate Factory class for easier configuration of new Field instances
> together with manually added new FieldTypeAttributes
> FieldType, once instantiated is read-only. Only fields value can be changed.

OK.

> Simple hierarchy of Field classes with core properties logically
> predefaulted. E.g.:
>
> NumberField,

Can't this just be our existing NumericField?

> StringField,

This would be like NOT_ANALYZED?

> TextField,

This would be ANALYZED?

> NonIndexedField,

This would be only stored?

> My questions and issues:
>
> Backward compatibility? Will this go to Lucene 3.0?

Maybe focus on 4.0 for starters and then if there's a nice backport we
can do that...?

> What is the best way to break this into small baby steps?

Hopefully this becomes clearer as we iterate.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message