lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Solr cloud planning
Date Wed, 04 Oct 2017 13:30:09 GMT
You'll almost certainly have to shard then. First of all Lucene has a
hard limit of 2^31 docs in a single index so there's a 2B limit.
There's no such limit on the number of docs in the collection (i.e. 5
shards each can have 2B docs for 10B docs total in the collection).

But nobody that I know of has that many documents on a shard, although
I've seen 200M-300M docs on a shard give good response time. I've also
seen 20M docs strain a beefy server.

Here's an outline of what it takes to find out:

The idea is to set up a test environment that you strain to breaking
with _your_ data/queries/environment. You can do this with just two
machines, from there its just multiplying...


On Wed, Oct 4, 2017 at 6:07 AM, gatanathoa <> wrote:
> There is a very large amount of data and there will be a constant addition of
> more data. There will be hundreds of millions if not billions of items.
> We have to be able to be able to be constantly indexing items but also allow
> for searching. Sadly there is no way to know the amount of searching that
> will be done, but was told to expected a fair amount. (I have no idea what
> "a fair amount" means either)
> I am not sure that only one shard will be adequate in this setup. The speed
> of the search results is the key here. There is also no way to test this
> prior to implementation.
> Is this enough information to be able to provide some guide lines?
> --
> Sent from:

View raw message