lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-2621) Extend Codec to handle also stored fields and term vectors
Date Tue, 15 Nov 2011 21:19:51 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2621:
--------------------------------

    Attachment: LUCENE-2621_tv_fi_si.patch

Attached is a new patch between trunk and branch. I think its at a point ready for merging.
* term vectors and fieldinfos move to codec.
* segmentinfos is moved to codec (before you could only realistically tweak a few things).
* term vectors are cut over to flex apis
* much better testing of term vectors in checkindex.
* added simpletext impls of term vectors, fieldinfos, and segmentinfos.

After this I would propose closing this issue and opening followup issues for:
* make a new more efficient term vector implementation for 4.0, the existing one would go
to preflex, and preflex impl should reorder the terms correctly to UTF8 order (this is a bug
all along in trunk, not caused here!)
* see if we can remove the global .fnx file completely, as its not per-segment and i'm not
sure its totally necessary, perhaps the field number consistency can be achieved with another
mechanism. Otherwise, we should add a codec hack/hook at least so that preflexRW can write
segments without .fnx files.
* make preflex implementations of the other various reader/writers so that our 4.0 impls are
clean and don't contain backwards compatibility code, and so that we have more realistic testing
of backwards with PreFlexRW.
* allow adding offsets to the postings lists impls either startOffset/endOffset() or via attribute
like term vectors do in this patch, so that a D&Penum can retrieve the offsets at a position.
this could make highlighting much faster without having to use vectors.
* try to make a few other things like deletes extendable via codec
* figure out a good design to cut over norms to DocValues.
* add a SimpleTextDocValues, its sorely needed.
                
> Extend Codec to handle also stored fields and term vectors
> ----------------------------------------------------------
>
>                 Key: LUCENE-2621
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2621
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Robert Muir
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>         Attachments: LUCENE-2621.patch, LUCENE-2621_rote.patch, LUCENE-2621_tv_fi_si.patch
>
>
> Currently Codec API handles only writing/reading of term-related data, while stored fields
data and term frequency vector data writing/reading is handled elsewhere.
> I propose to extend the Codec API to handle this data as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message