lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duke DAI <duke.dai....@gmail.com>
Subject Re: problem found with DiskDocValuesFormat
Date Tue, 22 Oct 2013 08:25:11 GMT
Thanks, Mike.

Finally I figured out the root cause. I use thread from Thread-Pool-1 to
probe indexes parallelly on multiple collections, but will consume
documents by thread from Thread-Pool-2. I hold the same DocValue object
reference to get values. After paying attention to thread switch, the
problem was resolved.

Thank you guys for building this feature into lucene-core.jar, it dispels
my worry about compatibility by using lucene-codecs.jar

Best regards,
Duke
If not now, when? If not me, who?


On Tue, Oct 22, 2013 at 12:48 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It's perfectly fine, and recommended, to reuse a thread across
> different queries (ie, use a thread pool in your app, up above
> Lucene).
>
> The ThreadLocals used in SegmentCoreReaders should not interfere or
> cause problems with that: they can easily be re-used across queries.
>
> Maybe you can boil down the issue you are seeing into a small test case?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Oct 21, 2013 at 10:35 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
> > Hi Mike,
> >
> > My scenario, query thread from a ThreadPool will be used to execute
> query.
> > So thread must have to be reused to handle various queries. Now that
> > SegmentCoreReaders
> > uses ThreadLocal to hold per-thread instance, I think some private
> > variables must belong to the given thread(file offset? I didn't find any
> > other thread-dependent status), otherwise object-level instance is
> enough.
> > And ThreadPool is very common to facilitate heavy load queries, does the
> > ThreadLocal mechanism support thread reuse for different queries? You
> know,
> > either thread creation is heavy or ThreadLocal cleanup from outside is
> > complicated.
> > My test shows NumericDocValues will return wrong value, but sure that
> it's
> > a long value, upper logic can verify whether the value is valid or not.
> >
> > As I described in earlier mail, in Lucene4.4
> Lucene42DocValuesFormat(in-memory)
> > has no problem, DiskDocValuesFormat(in-disk) has problem. Now in
> > Lucene4.5, MemoryDocValuesFormat(in-memory)
> > has no problem, but Lucene45DocValuesFormat(in-disk) has problem.
> > Coincidency? My test is far more complex than I described, two
> ThreadPool,
> > one is used to handle main query, one is used to query sub collections
> > parallelly with proper RejectedExecutionHandler(now one sub rejected,
> > cancel and fail all subs).
> >
> > For simple, what's the private status of per-thread NumericDocValues
> > instance? The private status can be re-used for different queries?
> >
> >
> > Best regards,
> > Duke
> > If not now, when? If not me, who?
> >
> >
> > On Mon, Oct 21, 2013 at 7:26 PM, Michael McCandless <
> > lucene@mikemccandless.com> wrote:
> >
> >> Can you describe what problem you are actually hitting?
> >>
> >> The purpose of docValuesLocal is to hold the per-Thread instance of
> >> each doc values, and re-use it when that thread comes back again
> >> asking for the same doc values.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Oct 21, 2013 at 6:28 AM, Duke DAI <duke.dai.007@gmail.com>
> wrote:
> >> > Hi guys,
> >> >
> >> > Seems I have the same problem with Lucene45DocValuesFormat, no problem
> >> with
> >> > MemoryDocValuesFormat. The problem I encountered with Lucene4.4 is
> with
> >> > DiskDocValuesFormat, no with Lucene42DocValuesFormat.
> >> >
> >> > I dig into a little and found the superficial cause. In
> >> SegmentCoreReaders,
> >> > there is a ThreadLocal variable, docValuesLocal. Its purpose is avoid
> >> > building data structure repeatedly by query thread . But how about the
> >> > query thread is from thread pool, and reused for different query?
> >> > I removed docValuesLocal and built a lucene-core.jar, it works with my
> >> > multi-threads(thread pool) test cases.
> >> >
> >> > Do you have any idea about this? Information is enough?
> >> >
> >> >
> >> > Thanks,
> >> > Duke
> >> >
> >> >
> >> > Best regards,
> >> > Duke
> >> > If not now, when? If not me, who?
> >> >
> >> >
> >> > On Tue, Aug 13, 2013 at 4:54 PM, Duke DAI <duke.dai.007@gmail.com>
> >> wrote:
> >> >
> >> >> Hi experts,
> >> >>
> >> >> I'm upgrading Lucene 4.4 and trying to use DocValues instead of store
> >> >> field for performance reason. But due to unknown size of
> index(depends
> >> on
> >> >> customer), so I will use DiskDocValuesFormat, especially for some
> binary
> >> >> field. Then I wrote my customized Codec:
> >> >>
> >> >>       final Codec codec = new Lucene42Codec() {
> >> >>
> >> >>         private final Lucene42DocValuesFormat memoryDVFormat = new
> >> >> Lucene42DocValuesFormat();
> >> >>         private final DiskDocValuesFormat diskDVFormat = new
> >> >> DiskDocValuesFormat();
> >> >>
> >> >>         @Override
> >> >>         public DocValuesFormat getDocValuesFormatForField(String
> field)
> >> {
> >> >>           if
> >> >> (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field)
> >> >>               ||
> LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field)
> >> ||
> >> >> LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) {
> >> >>             return diskDVFormat;
> >> >>           } else {
> >> >>             return memoryDVFormat
> >> >>           }
> >> >>         }
> >> >>       };
> >> >>       iwc.setCodec(codec);
> >> >>
> >> >> Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric
> >> field,
> >> >> long type. And others are binary.
> >> >>
> >> >> Then I consume DV like below pseudo-code:
> >> >>     nodeIDDocValuesSource =
> >> >>
> MultiDocValues.getNumericValues(searcher.getIndexReader(),
> >> >>                 LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE);
> >> >>
> >> >>    ......
> >> >>    long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc);
> >> >>
> >> >> Then I'm sure I get a wrong nodeId, which will be verified by upper
> >> logic
> >> >> and treated as data corruption.
> >> >>
> >> >>
> >> >> But if I change to memoryDVFormat for the long type field, then
> >> everything
> >> >> is OK.
> >> >>
> >> >> Also for upgrading legacy data, I keep two index format, DV or stored
> >> >> field, controlled by version. If I use stored field, everything is
> OK.
> >> >> So I guess there is a bug with  DiskDocValuesFormat, numeric data
> type,
> >> >> does it relate to byte-aligned numeric compression?
> >> >> Or I didn't use DiskDocValuesFormat correctly? Seems no other
> parameters
> >> >> for it.
> >> >>
> >> >> Sorry that I have no pure Lucene test case yet. Hope someone shed
> some
> >> >> light on this.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Best regards,
> >> >> Duke
> >> >> If not now, when? If not me, who?
> >> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message