lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Configuration recommendation for SolrCloud
Date Tue, 02 Jul 2019 06:13:51 GMT
As someone else wrote there are a lot of uncertainties and I recommend to test yourself to
find the optimal configuration. Some food for thought:
How many clients do you have and what is their concurrency? What operations will they do?
Do they Access Solr directly? You can use Jmeter to simulate the querying part (and also the
indexing). Depending on the concurrency of users you may need to think about the number of
CPUs.
What does moderate indexing mean? How much does the collection grow per day ?
Have you thought about putting the Zookeeper ensemble on dedicated nodes?

Why do you want to use an older Solr version? Why not the newest + JDK 11?

In what format are the documents? Will you convert them before ? What analysis will you do
on the documents (may have impact on index size etc)?

Also important - how do you plan to reindex the full collection in case a Schema field changes
(hint: look that the user query aliases so this can be done without interruption).

Normally I would expect a web app in between also for security reasons. You may need to scale
this one as well.

You don’t have to answer those questions here, but I recommend to answer them during a Proof-of-Concept
at your premises yourself.
I don’t see a point to create more than one cluster (except for disaster recovery and cross
data center replication if this is needed). Maybe I am overlooking something here why you
thought of multiple clusters.

> Am 25.06.2019 um 22:53 schrieb Rahul Goswami <rahul196452@gmail.com>:
> 
> Hello,
> We are running Solr 7.2.1 and planning for a deployment which will grow to
> 4 billion documents over time. We have 16 nodes at disposal.I am thinking
> between 3 configurations:
> 
> 1 cluster - 16 nodes
> vs
> 2 clusters - 8 nodes each
> vs
> 4 clusters -4 nodes each
> 
> Irrespective of the configuration, each node would host 8 shards (eg: a
> cluster with 16 nodes would have 16*8=128 shards; similarly, 32 shards in a
> 4 node cluster). These 16 nodes will be hosted across 4 beefy servers each
> with 128 GB RAM. So we can allocate 32 GB RAM (not heap space) to each
> node. what configuration would be most efficient for our use case
> considering moderate-heavy indexing and search load? Would also like to
> know the tradeoffs involved if any. Thanks in advance!
> 
> Regards,
> Rahul

Mime
View raw message