lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: Profiling lucene 5.2.0 based tool
Date Tue, 23 Feb 2016 07:20:18 GMT
Hi Sandeep,

How many threads do you use to do the indexing? The benchmarks of Lucene
are done on >20 threads IIRC.

-Rob

On Tue, Feb 23, 2016 at 8:01 AM, sandeep das <yarnhadoop@gmail.com> wrote:

> Hi,
>
> I've implemented a tool using lucene-5.2.0 to index my CSV files. The tool
> is reading data from CSV files(residing on disk) and creating indexes on
> local disk. It is able to process 3.5 MBps data. There are overall 46
> fields being added in one document. They are only of three data types 1.
> Integer, 2. Long, 3. String.
> All these fields are part of one CSV record and they are parsed using
> custom CSV parser which is faster than any split method of string.
>
> I've configured the following parameters to create indexWriter
> 1. setOpenMode(OpenMode.CREATE)
> 2. setCommitOnClose(true)
> 3. setRAMBufferSizeMB(512)   // Tried 256, 312 as well but performance is
> almost same.
>
> I've read over several blogs that lucene works way faster than these
> figures. So, I thought there are some bottlenecks in my code and profiled
> it using jvisualvm. The application is spending most of the time in
> DefaultIndexChain.processField i.e. 53% of total time.
>
>
> Following is the split of CPU usage in this application:
> 1. reading data from disk is taking 5% of total duration
> 2. adding document is taking 93% of total duration.
>
>    -    postUpdate  -> 12.8%
>    -    doAfterDocument -> 20.6%
>    -    updateDocument  -> 59.8%
>       - finishDocument -> 1.7%
>       - finishStoreFields -> 4.8%
>       - processFields -> 53.1%
>
>
> I'm also attaching the screen shot of call graph generated by jvisualvm.
>
> I've taken care of following points:
> 1. create only one instance of indexWriter
> 2. create only one instance of document and reuse it through out the life
> time of application
> 3. There will be no update in the documents hence only addDocument is
> invoked.
> Note: After going through the code I found out that addDocument is
> internally calling updateDocument only. Is there any way by which we can
> avoid calling updateDocument and only use addDocument API?
> 4. Using setValue APIs to set the pre created fields and reusing these
> fields to create indexes.
>
> Any tip to improve the performance will be immensely appreciated.
>
> Regards,
> Sandeep
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message