lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Performance potential for updating (reindexing) documents
Date Sat, 02 Apr 2016 21:05:26 GMT
On 4/1/2016 8:56 PM, Erick Erickson wrote:
> bq: The bottleneck is definitely Solr.
>
> Since you commented out the server.add(doclist), you're right to focus
> there. I've seen
> a few things that help.
>
> 1> batch the documents, i.e. in the doclist above the list should be
> on the order of 1,000 docs. Here
> are some numbers I worked up one time:
> https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/

For that test, I was just seeing how fast MySQL could push data.  Based
on the results I saw from a small-scale test where I *did* add them,
letting the code run the add on the entire database with a single thread
would have taken forever.  I'm aware of the need to batch -- the code
did create batches, it just didn't send them.

I have a couple of ideas for the design on a multi-threaded indexing
program, but haven't worked out how to implement it.

> 3> Make sure you're using CloudSolrClient.

It's not SolrCloud, so that wouldn't really be helpful. :)

Thanks,
Shawn


Mime
View raw message