lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (Commented) (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value
Date Tue, 11 Oct 2011 07:58:29 GMT


Simon Willnauer commented on LUCENE-3504:

mike let me explain my intention here. You are right we used to do this here but:
 * IDV is a strickly dense storage ie. each document has a value, that is the basic assumption.
 * if you want a default value you should specify it. if you don't specify it we provide best
effort to do this for you.
 * consistency is very important here, all variants return a value for every doc. For numerics
its 0 / 0.0 for bytes its BytesRef initialized with the default depending on the variant var/fixed.
 * the null invariant forces users to do a check for every document which makes no sense based
on the first assumption
 * if you have a numeric value you can't check for mission values since those values are primitives,
again consistency
I think we should not copy the behavior from FC here for the above reasons. what we should
rather do is make this absolutely clear and remove the return value from getBytes(BR) and
document that the BR will always be filled. if you want to have some "missing value" behavior
you should make sure you add the right values. The sort missing last/first stuff seems like
something born from the fact that we build FC by uninverting an indexed field and IDV doesn't
have this limitation.
> DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have
a value
> ----------------------------------------------------------------------------------------------
>                 Key: LUCENE-3504
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
> I'm looking at making a FieldComparator that uses DV's SortedSource to
> sort by string field (ie just like TermOrdValComparator, except using
> DV instead of FieldCache).  We already have comparators for DV int and
> float DV fields.
> But one thing I noticed is we can't detect documents that didn't have
> any value indexed vs documents that had empty byte[] indexed.
> This is easy to fix (and we used to do this), because these types are
> deref'd (ie, each doc stores an address, and then separately looks up
> the byte[] at that address), we can reserve ord/address 0 to mean "doc
> didn't have the field".  Then we should return null when you retrieve
> the BytesRef value for that field.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message