lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Determining the Number of Solr Shards
Date Thu, 08 Jan 2015 02:36:03 GMT
1,000 queries/second is not trivial either. My starting point for QPS
is about 50.
But that's entirely "straw man" and (and as the link Shawn provided indicates)
only testing will determine if that's realistic.

So going for 1,000 queries/second, you're talking.... 20 replicas for
each shard.

And we haven't even talked about the number of shards yet.

You're actually quite a ways from being to predict much about hardware.

For instance, what is your retention?, i.e. how long you'll have to
keep the documents.
Let's assume that your _average_ writes/second is even 5,000. That's
18M docs/hour or
400+M docs/day. My (again straw-man) number for the number of docs you
can put on
a single shard is 100M (again with the caveat that only testing will
tell, this may be 20M
and may be 200M or even more).

Let's be generous and, for round numbers, assume you're adding 400M docs/day and
each shard can hold 200M docs (WARNING! These are _very_ optimistic numbers!)
you're talking 20 X 2 X (number of days retention you need) replicas.


Don't mean to be too much of a downer, but this is not as simple as
throwing a few
big machines at the problem and being good-to-go.

Best,
Erick

On Wed, Jan 7, 2015 at 6:14 PM, Nishanth S <nishanth.2884@gmail.com> wrote:
> Thanks Shawn and Walter.Yes those are 12,000 writes/second.Reads  for the
> moment would be in the 1000 reads/second. Guess finding out the right
> number  of  shards would be my starting point.
>
> Thanks,
> Nishanth
>
>
> On Wed, Jan 7, 2015 at 6:28 PM, Walter Underwood <wunder@wunderwood.org>
> wrote:
>
>> This is described as “write heavy”, so I think that is 12,000
>> writes/second, not queries.
>>
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/
>>
>>
>> On Jan 7, 2015, at 5:16 PM, Shawn Heisey <apache@elyograg.org> wrote:
>>
>> > On 1/7/2015 3:29 PM, Nishanth S wrote:
>> >> I  am working on coming up with a solr architecture layout  for my use
>> >> case.We are a very write heavy application with  no down time tolerance
>> and
>> >> have low SLAs on reads when compared with writes.I am looking at around
>> >> 12K tps with average index size of solr document in the range of 6kB.I
>> >> would like to go with 3 replicas for that extra fault tolerance and
>> trying
>> >> to identify the number  of shards.The machines are monsterous and have
>> >> around 100 GB of RAM and  more than 24 cores on each.Is there a way to
>> >> come at the number of  desired shards in this case.Any pointers would be
>> >> helpful.
>> >
>> > This is one of those questions that's nearly impossible to answer
>> > without field trials that have a production load on a production index.
>> > Minor changes to either config or schema can have a major impact on the
>> > query load Solr will support.
>> >
>> >
>> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>> >
>> > A query load of 12000 queries per second is VERY high.  That is likely
>> > to require a **LOT** of hardware, because you're going to need a lot of
>> > replicas.  Because each server will be handling quite a lot of
>> > simultaneous queries, the best results will come from having only one
>> > replica (solr core) per server.
>> >
>> > Generally you'll get better results for a high query load if you don't
>> > shard your index, but depending on how many docs you have, you might
>> > want to shard.  You haven't said how many docs you have.
>> >
>> > The key to excellent performance with Solr is to make sure that the
>> > system never hits the disk to read index data -- for 12000 queries per
>> > second, the index must be fully cached in RAM.  If Solr must go to the
>> > actual disk, query performance will drop significantly.
>> >
>> > Thanks,
>> > Shawn
>> >
>>
>>

Mime
View raw message