lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4591) Make StoredFieldsFormat more configurable
Date Thu, 06 Dec 2012 13:51:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13511352#comment-13511352
] 

Adrien Grand commented on LUCENE-4591:
--------------------------------------

bq. We would like to store certain fields that requires a different type of data structure
than the one currently supported, i.e., a document is not a simple list of fields, but a more
complex data structure.

How would you do that? As far as I know, the only way to access stored fields is through the
StoredFieldVisitor API, which only supports sequences of fields which are necessary numbers,
strings or byte arrays.

bq. However, this is kind of hacky, and we will have to keep in synch our copy with the original
implementation.

I think writing a different StoredFieldsFormat impl would be a better option as you could
still require only one disk seek in the worst case. You could reuse some components of CompressingStoredFieldsFormat
such as CompressionMode, and maybe we could expose other ones such as CompressingStoredFieldsFormatIndexWriter,Reader
(they write/read the stored fields index in a memory-efficient way).

If you don't care about increasing the number of disk seeks and if you can encode/decode your
complex data structure into a small opaque binary blob, maybe DocValues could be an option
(Mike and Robert are adding per-field format support to DocValues in LUCENE-4547).
                
> Make StoredFieldsFormat more configurable
> -----------------------------------------
>
>                 Key: LUCENE-4591
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4591
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 4.1
>            Reporter: Renaud Delbru
>             Fix For: 4.1
>
>
> The current StoredFieldsFormat are implemented with the assumption that only one type
of StoredfieldsFormat is used by the index.
> We would like to be able to configure a StoredFieldsFormat per field, similarly to the
PostingsFormat.
> There is a few issues that need to be solved for allowing that:
> 1) allowing to configure a segment suffix to the StoredFieldsFormat
> 2) implement SPI interface in StoredFieldsFormat 
> 3) create a PerFieldStoredFieldsFormat
> We are proposing to start first with 1) by modifying the signature of StoredFieldsFormat#fieldsReader
and StoredFieldsFormat#fieldsWriter so that they use SegmentReadState and SegmentWriteState
instead of the current set of parameters.
> Let us know what you think about this idea. If this is of interest, we can contribute
with a first path for 1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message