lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Bridges <sean.brid...@gmail.com>
Subject Re: problem found with DiskDocValuesFormat
Date Wed, 21 Aug 2013 15:30:26 GMT
What is the recommended way to use DiskDocValuesFormat in production if we
can't reindex when we upgrade?

Will the 4.4 version of DDVF be backwards compatible, or should we make our
own copy of DDVF and give it a different codec name to protect ourselves
against incompatible changes?

Thanks,

Sean


On Tue, Aug 13, 2013 at 4:34 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> DiskDVFormat does not have index back compatibility between minor
> releases; maybe that's what you are seeing?  So, you must fully
> re-index after any DiskDVFormat field after upgrading ...
>
> Only the default formats support index back compatibility between releases.
>
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Aug 13, 2013 at 4:54 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
> > Hi experts,
> >
> > I'm upgrading Lucene 4.4 and trying to use DocValues instead of store
> field
> > for performance reason. But due to unknown size of index(depends on
> > customer), so I will use DiskDocValuesFormat, especially for some binary
> > field. Then I wrote my customized Codec:
> >
> >       final Codec codec = new Lucene42Codec() {
> >
> >         private final Lucene42DocValuesFormat memoryDVFormat = new
> > Lucene42DocValuesFormat();
> >         private final DiskDocValuesFormat diskDVFormat = new
> > DiskDocValuesFormat();
> >
> >         @Override
> >         public DocValuesFormat getDocValuesFormatForField(String field) {
> >           if
> > (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field)
> >               || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field)
> ||
> > LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) {
> >             return diskDVFormat;
> >           } else {
> >             return memoryDVFormat
> >           }
> >         }
> >       };
> >       iwc.setCodec(codec);
> >
> > Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric
> field,
> > long type. And others are binary.
> >
> > Then I consume DV like below pseudo-code:
> >     nodeIDDocValuesSource =
> >             MultiDocValues.getNumericValues(searcher.getIndexReader(),
> >                 LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE);
> >
> >    ......
> >    long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc);
> >
> > Then I'm sure I get a wrong nodeId, which will be verified by upper logic
> > and treated as data corruption.
> >
> >
> > But if I change to memoryDVFormat for the long type field, then
> everything
> > is OK.
> >
> > Also for upgrading legacy data, I keep two index format, DV or stored
> > field, controlled by version. If I use stored field, everything is OK.
> > So I guess there is a bug with  DiskDocValuesFormat, numeric data type,
> > does it relate to byte-aligned numeric compression?
> > Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters
> > for it.
> >
> > Sorry that I have no pure Lucene test case yet. Hope someone shed some
> > light on this.
> >
> >
> >
> >
> > Best regards,
> > Duke
> > If not now, when? If not me, who?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message