lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Lalevée (JIRA) <>
Subject [jira] Updated: (LUCENE-662) Extendable writer and reader of field data
Date Sun, 01 Apr 2007 19:26:32 GMT


Nicolas Lalevée updated LUCENE-662:

    Attachment: indexFormat.patch

Synchronized with the trunk, so with the payload feature. It allowed me to refactor in one
class the payload writing which is in two places today : it is now in the DefaultPostingWriter

>From my last update, the TODO list is still to do, nothing has been fixed. Furthermore
there is a regression in the new patch : the ensureOpen() is not correctly handled for lazy
loaded fields : a test fail. This is due to the fact that the FieldsReader doesn't handle
it anymore in my patch. As the data struture can be customized, lazy loading is exported to
the FieldData created by the FieldsReader. So the both instance have to communicate about
the closing of the streams. So a new item in the TODO list.

As discussed in java-dev, here is a light patch with only the index format handling, without
the possibility to redefine how data and postings are store/retreived.

> Extendable writer and reader of field data
> ------------------------------------------
>                 Key: LUCENE-662
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Store
>            Reporter: Nicolas Lalevée
>            Priority: Minor
>         Attachments: entrytable.patch, generic-fieldIO-2.patch, generic-fieldIO-3.patch,
generic-fieldIO-4.patch, generic-fieldIO-5.patch, generic-fieldIO.patch, indexFormat-only.patch,
indexFormat.patch, indexFormat.patch, indexFormat.patch
> As discussed on the dev mailing list, I have modified Lucene to allow to define how the
data of a field is writen and read in the index.
> Basically, I have introduced the notion of IndexFormat. It is in fact a factory of FieldsWriter
and FieldsReader. So the IndexReader, the indexWriter and the SegmentMerger are using this
factory and not doing a "new FieldsReader/Writer()".
> I have also introduced the notion of FieldData. It handles every data of a field, and
also the writing and the reading in a stream. I have done this way because in the current
design of Lucene, Fiedable is an interface, so methods with a protected or package visibility
cannot be defined.
> A FieldsWriter just writes data into a stream via the FieldData of the field.
> A FieldsReader instanciates a FieldData depending on the field name. Then it use the
field data to read the stream. And finnaly it instanciates a Field with the field data.
> About compatibility, I think it is kept, as I have writen a DefaultIndexFormat that provides
some DefaultFieldsWriter and DefaultFieldsReader. These implementations do the exact job that
is done today.
> To acheive this modification, some classes and methods had to be moved from private and/or
final to public or protected.
> About the lazy fields, I have implemented them in a more general way in the implementation
of the abstract class FieldData, so it will be totally transparent for the Lucene user that
will extends FieldData. The stream is kept in the fieldData and used as soon as the stringValue
(or something else) is called. Implementing this way allowed me to handle the recently introduced
LOAD_FOR_MERGE; it is just a lazy field data, and when read() is called on this lazy field
data, the saved input stream is directly copied in the output stream.
> I have a last issue with this patch. The current design allow to read an index in an
old format, and just do a writer.addIndexes() into a new format. With the new design, you
cannot, because the writer will use the FieldData.write provided by the reader.
> enjoy !

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message