lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value
Date Tue, 11 Oct 2011 10:51:11 GMT


Michael McCandless commented on LUCENE-3504:

So this would mean doc values can never support the notion of a
"missing value" for a document, right?

Ie, this is more limited than FieldCache.

So it's the app's job to always index a doc value for every document,
else the behavior is hardwired at search time (0 for numerics, new
byte[0] for var-length bytes, zero bytes for fixed-length bytes).

I guess if for some reason an app really has a problem with this, it
could go and store its own "single bit docvalues field" (eg int
field with only 0 and 1 values) to indicate "missing-ness", and then
at sort time, sort first by this field and second by the "normal" sort
field(s).  This would let you sort missing first or last, at least.

OK I actually like this approach: it's stricter than field cache.

The app is not allowed to skip documents when making a doc-values
field, or if it does, it must accept the hardwired defaults we return
for such documents.

> DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have
a value
> ----------------------------------------------------------------------------------------------
>                 Key: LUCENE-3504
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
> I'm looking at making a FieldComparator that uses DV's SortedSource to
> sort by string field (ie just like TermOrdValComparator, except using
> DV instead of FieldCache).  We already have comparators for DV int and
> float DV fields.
> But one thing I noticed is we can't detect documents that didn't have
> any value indexed vs documents that had empty byte[] indexed.
> This is easy to fix (and we used to do this), because these types are
> deref'd (ie, each doc stores an address, and then separately looks up
> the byte[] at that address), we can reserve ord/address 0 to mean "doc
> didn't have the field".  Then we should return null when you retrieve
> the BytesRef value for that field.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message