Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F26B7F346 for ; Wed, 21 Aug 2013 15:31:35 +0000 (UTC) Received: (qmail 46301 invoked by uid 500); 21 Aug 2013 15:31:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 45914 invoked by uid 500); 21 Aug 2013 15:31:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 45902 invoked by uid 99); 21 Aug 2013 15:31:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 15:31:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sean.bridges@gmail.com designates 209.85.215.175 as permitted sender) Received: from [209.85.215.175] (HELO mail-ea0-f175.google.com) (209.85.215.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Aug 2013 15:31:17 +0000 Received: by mail-ea0-f175.google.com with SMTP id m14so356138eaj.6 for ; Wed, 21 Aug 2013 08:30:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=AFMtKDk+Z98yd/J0IQKJPxoQv9VYAsMgBxYtyPkSnYs=; b=eIS5sjMYsyoZmruftKWp/6DwY6L7rd2hJfVmhEx1uao3tyJ3kL0b45oQCDpYLOTQSz FW7W+t1w+Fh2dV2zgycWw/Ytwu+NUDQaFqgF4f5mTTQus15S1WScIhiD/JP205hQUncD YNeqHzpuBILPkDz8OxiaH0zbLgR/bVt983CpiERzbRUNrIL607QnKnBEgMc/OL9U8Usq EDJ5mJf3LJ/e38Ir83rxcjwpEbMYWL7JsM7U2GLrN7oZS0TdsQ+uBuGP4PZjE/5BE//t WLyEA3ioZ440uK3bM33FKuaCkhAm5RpRFlHA1wjdd0KOlsK/2aZSpN7kB/M8XjjMqdwe LIzQ== X-Received: by 10.14.198.73 with SMTP id u49mr11474253een.13.1377099056667; Wed, 21 Aug 2013 08:30:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.14.210.73 with HTTP; Wed, 21 Aug 2013 08:30:26 -0700 (PDT) In-Reply-To: References: From: Sean Bridges Date: Wed, 21 Aug 2013 08:30:26 -0700 Message-ID: Subject: Re: problem found with DiskDocValuesFormat To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7b33daf2180a9204e476dc0f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b33daf2180a9204e476dc0f Content-Type: text/plain; charset=ISO-8859-1 What is the recommended way to use DiskDocValuesFormat in production if we can't reindex when we upgrade? Will the 4.4 version of DDVF be backwards compatible, or should we make our own copy of DDVF and give it a different codec name to protect ourselves against incompatible changes? Thanks, Sean On Tue, Aug 13, 2013 at 4:34 AM, Michael McCandless < lucene@mikemccandless.com> wrote: > DiskDVFormat does not have index back compatibility between minor > releases; maybe that's what you are seeing? So, you must fully > re-index after any DiskDVFormat field after upgrading ... > > Only the default formats support index back compatibility between releases. > > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Aug 13, 2013 at 4:54 AM, Duke DAI wrote: > > Hi experts, > > > > I'm upgrading Lucene 4.4 and trying to use DocValues instead of store > field > > for performance reason. But due to unknown size of index(depends on > > customer), so I will use DiskDocValuesFormat, especially for some binary > > field. Then I wrote my customized Codec: > > > > final Codec codec = new Lucene42Codec() { > > > > private final Lucene42DocValuesFormat memoryDVFormat = new > > Lucene42DocValuesFormat(); > > private final DiskDocValuesFormat diskDVFormat = new > > DiskDocValuesFormat(); > > > > @Override > > public DocValuesFormat getDocValuesFormatForField(String field) { > > if > > (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field) > > || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field) > || > > LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) { > > return diskDVFormat; > > } else { > > return memoryDVFormat > > } > > } > > }; > > iwc.setCodec(codec); > > > > Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric > field, > > long type. And others are binary. > > > > Then I consume DV like below pseudo-code: > > nodeIDDocValuesSource = > > MultiDocValues.getNumericValues(searcher.getIndexReader(), > > LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE); > > > > ...... > > long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc); > > > > Then I'm sure I get a wrong nodeId, which will be verified by upper logic > > and treated as data corruption. > > > > > > But if I change to memoryDVFormat for the long type field, then > everything > > is OK. > > > > Also for upgrading legacy data, I keep two index format, DV or stored > > field, controlled by version. If I use stored field, everything is OK. > > So I guess there is a bug with DiskDocValuesFormat, numeric data type, > > does it relate to byte-aligned numeric compression? > > Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters > > for it. > > > > Sorry that I have no pure Lucene test case yet. Hope someone shed some > > light on this. > > > > > > > > > > Best regards, > > Duke > > If not now, when? If not me, who? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --047d7b33daf2180a9204e476dc0f--