lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets
Date Tue, 12 Feb 2013 09:23:13 GMT


Shai Erera commented on LUCENE-4764:

Facets42Codec has a nocommit about handling multiple category lists as well as if the default
field has changed. Currently (in the patch), it hard-codes to "$facets", but that won't work
if e.g. the app indexed categories into a different field.

Talking with Mike about it yesterday, I thought that what needs to be done is for the codec
to receive the FacetIndexingParams, build a HashSet<String> of all fields that hold
facets, and then use it in .getDocValuesFormatForField.

However, I realized later that this is not doable, since Codecs must have a default constructor,
and b/c of how they are initialized, they cannot rely on stuff passed to them in the ctor
(e.g. when they are initialized by a reader?). Is that true? I looked at few Codecs impl,
and looks like none relies on stuff passed to it in the ctor.

If so, perhaps we should also override the FieldInfosFormat and use it to detect which fields
are "facet" fields? E.g. it will be a subset of all fields that have BinaryDV. But that's
not distinguishing enough ... and we cannot add a DVType, so cannot distinguish BINARY from
FACETS_BINARY even if we wanted to make a different BinaryDV extension ...

Crazy, but can we write a boolean to FieldInfo {{hasFacets}}? Is it supported if we e.g. extend
(I realize, many) classes?
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>                 Key: LUCENE-4764
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>         Attachments: LUCENE-4764.patch
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message