lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4765) Multi-valued docvalues field
Date Sun, 17 Feb 2013 01:02:12 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-4765:
--------------------------------

    Attachment: LUCENE-4765.patch

Updated patch showing differences between trunk and branch.

I actually think this is ready:
* its a docvalues field where you can add multiple instances to a document.
* these are dereferenced (like SORTED), except for each document you get a ordered list of
ordinals instead of a single one.
* transparent pass-thru to FieldCache.getDocTermOrds: so this "completes" dv in that we have
index-time equivalent to what FieldCache provides.
* if you ask for FieldCache.getDocTermOrds, instead of insanity for a single-valued field
indexed by SORTED, you get a bridge API: so e.g. if we wanted we could start with a per-segment
facet API for solr that handles both single/multi-valued and specialize only if it increases
perf.
* all apis cutover, including join/ and grouping/, though while doing this I noticed an opportunity
to separately make join/ more efficient (LUCENE-4771)
* refactored DocValues default merge to be simpler (also the existing SORTED case), additionally
this benefits from the RAM improvements Adrien committed in LUCENE-4780.
* Lucene42 implementation uses an FST for the ord/term "dictionary", and the ordinal list
per-doc is essential a BINARY entry (vint+dgap encoded, as this seems to be the most efficient
from the tests Shai et al have been doing with lucene/facets).
* SimpleText, Disk, Asserting, and CheapBastard codecs.
* I added random tests that basically index and delete lots of things and verify the contents
against stored fields, and DocTermOrds built in RAM from the indexed contents. 

Just wanted to get the patch up for review for a while. In the meantime I'll continue to make
some commits: for example I want to add this type to IndexWriter's diskfull/exception/thread
interrupt/etc tests and the usual rounding out of things.

                
> Multi-valued docvalues field
> ----------------------------
>
>                 Key: LUCENE-4765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4765
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>         Attachments: LUCENE-4765.patch, LUCENE-4765.patch
>
>
> The general idea is basically the docvalues parallel to FieldCache.getDocTermOrds/UninvertedField
> Currently this stuff is used in e.g. grouping and join for multivalued fields, and in
solr for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message