lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarun Kumar <ta...@sumologic.com>
Subject Re: lucene index reader performance
Date Mon, 04 Jul 2016 05:39:46 GMT
Thanks for reply Michael! In my application, i need to get millions of
documents per search.

Use case is following: return documents in increasing order of field time.
Client (caller) can't hold more than a few thousand docs at a time so it
gets all docIds and corresponding time field for each doc, sort them on
time and get n docs at a time. To support this usecase, i am:

- getting all docsIds first.
- Sort docIds on time fields.
- Query n docids at a time from client which make
indexReader.document(docId) call for all n docs at server, combine the docs
these docs and return.

indexReader.document(docId) is creating bottlenecks. What alternatives do
you suggest?

On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Are you maybe trying to load too many documents for each search request?
>
> The IR.document API is designed to be used to load just a few hits, like a
> page worth or ~ 10 documents, per search.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <tarun@sumologic.com> wrote:
>
>> I am running lucene 4.6.1. I am trying to get documents corresponding to
>> docIds. All threads get stuck (don't get stuck exactly but spend a LOT of
>> time in) at:
>>
>> java.lang.Thread.State: RUNNABLE
>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>         at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>         at
>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>         at
>>
>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>         at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>         at
>>
>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>         at
>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>         at
>>
>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>         at
>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>
>>
>> There is no disk throttling. What can result into this?
>>
>> Thanks
>> Tarun
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message