lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Created] (LUCENE-5159) compressed diskdv sorted/sortedset termdictionaries
Date Thu, 08 Aug 2013 17:23:48 GMT
Robert Muir created LUCENE-5159:

             Summary: compressed diskdv sorted/sortedset termdictionaries
                 Key: LUCENE-5159
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
            Reporter: Robert Muir

Sorted/SortedSet give you ordinal(s) per document, but them separately have a "term dictionary"
of all the values.

You can do a few operations on these:
* ord -> term lookup (e.g. retrieving facet labels)
* term -> ord lookup (reverse lookup: e.g. fieldcacherangefilter)
* get a term enumerator (e.g. merging, ordinalmap construction)

The current implementation for diskdv was the simplest thing that can possibly work: under
the hood it just makes a binary DV for these (treating ordinals as document ids). When the
terms are fixed length, you can address a term directly with multiplication. When they are
variable length though, we have to store a packed ints structure in RAM.

This variable length case is overkill and chews up a lot of RAM if you have many unique values.
It also chews up a lot of disk since all the values are just concatenated (no sharing).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message