lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duke DAI <duke.dai....@gmail.com>
Subject problem found with DiskDocValuesFormat
Date Tue, 13 Aug 2013 08:54:23 GMT
Hi experts,

I'm upgrading Lucene 4.4 and trying to use DocValues instead of store field
for performance reason. But due to unknown size of index(depends on
customer), so I will use DiskDocValuesFormat, especially for some binary
field. Then I wrote my customized Codec:

      final Codec codec = new Lucene42Codec() {

        private final Lucene42DocValuesFormat memoryDVFormat = new
Lucene42DocValuesFormat();
        private final DiskDocValuesFormat diskDVFormat = new
DiskDocValuesFormat();

        @Override
        public DocValuesFormat getDocValuesFormatForField(String field) {
          if
(LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field)
              || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field) ||
LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) {
            return diskDVFormat;
          } else {
            return memoryDVFormat
          }
        }
      };
      iwc.setCodec(codec);

Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric field,
long type. And others are binary.

Then I consume DV like below pseudo-code:
    nodeIDDocValuesSource =
            MultiDocValues.getNumericValues(searcher.getIndexReader(),
                LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE);

   ......
   long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc);

Then I'm sure I get a wrong nodeId, which will be verified by upper logic
and treated as data corruption.


But if I change to memoryDVFormat for the long type field, then everything
is OK.

Also for upgrading legacy data, I keep two index format, DV or stored
field, controlled by version. If I use stored field, everything is OK.
So I guess there is a bug with  DiskDocValuesFormat, numeric data type,
does it relate to byte-aligned numeric compression?
Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters
for it.

Sorry that I have no pure Lucene test case yet. Hope someone shed some
light on this.




Best regards,
Duke
If not now, when? If not me, who?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message