lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Slow indexing speed when collection size is large
Date Sat, 06 May 2017 12:41:25 GMT
On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote:
> I'm using Solrj for the indexing, not using curl. Normally I bundle
> about 1000 documents for each POST. There's more than 300GB of RAM for
> that server, and I do not use any sharing at the moment.

Looking over your email history on the list, I was able to determine
some information, but not everything I was wondering about.  I have some
questions.

Are you still using the Extracting Request Handler for your rich
document handling, or have you moved Tika processing outside Solr?
If it's outside Solr, is it on different machines?
Are your rich documents still requiring OCR?

Other questions:

On a single Solr server, how much total memory is installed?
What is the total amount of memory reserved for Solr heaps on that server?
What is the total on-disk size of all the Solr indexes on that server?
-- Multiple replicas must be included if they are present on one machine.
>From the core (shard replica) perspective, how many documents are on
that server?
-- Multiple replicas must be included here too.
Is there software other than the Solr server process(es) running on that
server?
Are you making queries at the same time you're indexing?

Thanks,
Shawn


Mime
View raw message