lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serba <ase...@gmail.com>
Subject Re: indexer threading?
Date Tue, 27 Apr 2010 14:53:15 GMT
Hi Brian,

I was testing indexing performance on a high cpu box recently and came
to the same issue. I tried different indexing methods ( xml,
CSVRequestHandler and Solrj + BinaryRequestWriter with multiple
threads ). The last method is the fastest indeed. I believe that
multiple threads approach gives you better performance if you have
complex text analysis. I had very simple analysis -
WhitespaceTokenizer only and performance boost with increasing threads
was not very impressive ( but still ). I guess that in case of simple
text analysis overall performance comes to synchronization issues.

I tried to profile application during indexing phase for CPU times and
monitors and it seems that most of blocking is on the following
methods:
- DocumentsWriter.doBalanceRAM
- DocumentsWriter.getThreadState
- SolrIndexWriter.ensureOpen

I don't know the guts of Solr/Lucene in such details so can't make any
conclusions. Are there any configuration techniques to improve
indexing performance in multiple threads scenario?

Alex

On Mon, Apr 26, 2010 at 6:52 PM, Wawok, Brian <Brian.Wawok@cmegroup.com> wrote:
> Hi,
>
> I was wondering about how the multi-threading of the indexer works?  I am using SolrJ
to stream documents to a server. As I add more threads on the client side, I slowly see both
speed and CPU usage go up on the indexer side. Once I hit about 4 threads, my indexer is at
100% cpu usage (of 1 CPU on a 4-way box), and will not do any more work. It is pretty fast,
doing something like 75k lines of text per second.. but I would really like to use all 4 CPUs
on the indexer. Is the just a limitation of Solr, or is this a limitation of using SolrJ and
document streaming?
>
>
> Thanks,
>
>
> Brian
>

Mime
View raw message