lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: problem found with DiskDocValuesFormat
Date Mon, 21 Oct 2013 11:26:01 GMT
Can you describe what problem you are actually hitting?

The purpose of docValuesLocal is to hold the per-Thread instance of
each doc values, and re-use it when that thread comes back again
asking for the same doc values.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Oct 21, 2013 at 6:28 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
> Hi guys,
>
> Seems I have the same problem with Lucene45DocValuesFormat, no problem with
> MemoryDocValuesFormat. The problem I encountered with Lucene4.4 is with
> DiskDocValuesFormat, no with Lucene42DocValuesFormat.
>
> I dig into a little and found the superficial cause. In SegmentCoreReaders,
> there is a ThreadLocal variable, docValuesLocal. Its purpose is avoid
> building data structure repeatedly by query thread . But how about the
> query thread is from thread pool, and reused for different query?
> I removed docValuesLocal and built a lucene-core.jar, it works with my
> multi-threads(thread pool) test cases.
>
> Do you have any idea about this? Information is enough?
>
>
> Thanks,
> Duke
>
>
> Best regards,
> Duke
> If not now, when? If not me, who?
>
>
> On Tue, Aug 13, 2013 at 4:54 PM, Duke DAI <duke.dai.007@gmail.com> wrote:
>
>> Hi experts,
>>
>> I'm upgrading Lucene 4.4 and trying to use DocValues instead of store
>> field for performance reason. But due to unknown size of index(depends on
>> customer), so I will use DiskDocValuesFormat, especially for some binary
>> field. Then I wrote my customized Codec:
>>
>>       final Codec codec = new Lucene42Codec() {
>>
>>         private final Lucene42DocValuesFormat memoryDVFormat = new
>> Lucene42DocValuesFormat();
>>         private final DiskDocValuesFormat diskDVFormat = new
>> DiskDocValuesFormat();
>>
>>         @Override
>>         public DocValuesFormat getDocValuesFormatForField(String field) {
>>           if
>> (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field)
>>               || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field) ||
>> LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) {
>>             return diskDVFormat;
>>           } else {
>>             return memoryDVFormat
>>           }
>>         }
>>       };
>>       iwc.setCodec(codec);
>>
>> Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric field,
>> long type. And others are binary.
>>
>> Then I consume DV like below pseudo-code:
>>     nodeIDDocValuesSource =
>>             MultiDocValues.getNumericValues(searcher.getIndexReader(),
>>                 LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE);
>>
>>    ......
>>    long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc);
>>
>> Then I'm sure I get a wrong nodeId, which will be verified by upper logic
>> and treated as data corruption.
>>
>>
>> But if I change to memoryDVFormat for the long type field, then everything
>> is OK.
>>
>> Also for upgrading legacy data, I keep two index format, DV or stored
>> field, controlled by version. If I use stored field, everything is OK.
>> So I guess there is a bug with  DiskDocValuesFormat, numeric data type,
>> does it relate to byte-aligned numeric compression?
>> Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters
>> for it.
>>
>> Sorry that I have no pure Lucene test case yet. Hope someone shed some
>> light on this.
>>
>>
>>
>>
>> Best regards,
>> Duke
>> If not now, when? If not me, who?
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message