Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD7D39ADA for ; Tue, 11 Oct 2011 07:58:52 +0000 (UTC) Received: (qmail 429 invoked by uid 500); 11 Oct 2011 07:58:51 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 380 invoked by uid 500); 11 Oct 2011 07:58:51 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 369 invoked by uid 99); 11 Oct 2011 07:58:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 07:58:51 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Oct 2011 07:58:50 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B8B613025F1 for ; Tue, 11 Oct 2011 07:58:29 +0000 (UTC) Date: Tue, 11 Oct 2011 07:58:29 +0000 (UTC) From: "Simon Willnauer (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1265337387.18234.1318319909758.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <547263633.16623.1318284029909.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3504) DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124789#comment-13124789 ] Simon Willnauer commented on LUCENE-3504: ----------------------------------------- mike let me explain my intention here. You are right we used to do this here but: * IDV is a strickly dense storage ie. each document has a value, that is the basic assumption. * if you want a default value you should specify it. if you don't specify it we provide best effort to do this for you. * consistency is very important here, all variants return a value for every doc. For numerics its 0 / 0.0 for bytes its BytesRef initialized with the default depending on the variant var/fixed. * the null invariant forces users to do a check for every document which makes no sense based on the first assumption * if you have a numeric value you can't check for mission values since those values are primitives, again consistency I think we should not copy the behavior from FC here for the above reasons. what we should rather do is make this absolutely clear and remove the return value from getBytes(BR) and document that the BR will always be filled. if you want to have some "missing value" behavior you should make sure you add the right values. The sort missing last/first stuff seems like something born from the fact that we build FC by uninverting an indexed field and IDV doesn't have this limitation. > DocValues: deref/sorted bytes types shouldn't return empty byte[] when doc didn't have a value > ---------------------------------------------------------------------------------------------- > > Key: LUCENE-3504 > URL: https://issues.apache.org/jira/browse/LUCENE-3504 > Project: Lucene - Java > Issue Type: Bug > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.0 > > > I'm looking at making a FieldComparator that uses DV's SortedSource to > sort by string field (ie just like TermOrdValComparator, except using > DV instead of FieldCache). We already have comparators for DV int and > float DV fields. > But one thing I noticed is we can't detect documents that didn't have > any value indexed vs documents that had empty byte[] indexed. > This is easy to fix (and we used to do this), because these types are > deref'd (ie, each doc stores an address, and then separately looks up > the byte[] at that address), we can reserve ord/address 0 to mean "doc > didn't have the field". Then we should return null when you retrieve > the BytesRef value for that field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org