lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Solr cloud performance degradation with billions of documents
Date Fri, 15 Aug 2014 23:30:25 GMT

bq: I would have agreed with you fully an hour ago.....

Well, I now disagree with myself too :).... I don't mind
talking to myself. I don't even mind arguing with myself. I
really _do_ mind losing the arguments I have with
myself though.


OK, that has a much better chance of working, I obviously
misunderstood. So you'll have 60 different collections and each
collection will have one shard on each machine.

When the time comes to roll some of the collections off the
end due to age, "collection aliasing" may be helpful. I still think
you're significantly undersized, but you know your problem
space better than I do.

I fear the problem will be this: you won't even be able to do
basic searches as the number of shards on a particular
machine increase. To test, fire off a simple search for each of
your 60 days. I expect it'll blow you out of the water. This
assumes that all your shards are hosted in the same JVM
on each of your 32 machines. But that's totally a guess.

Keep us posted!

On Fri, Aug 15, 2014 at 2:40 PM, Toke Eskildsen <> wrote:
> Erick Erickson [] wrote:
>> I guess that my main issue is that from everything I've seen so far,
>> this project is doomed. You simply cannot put 7B documents in a single
>> shard, period. Lucene has a 2B hard limit.
> I would have agreed with you fully an hour ago and actually planned to ask Wilbur to
check if he had corrupted his indexes. However, his latest post suggests that the scenario
is more about having a larger amount of more resonably sized shards in play than building
gigantic shards.
>> For instance, Wilburn is talking about only using 6G of memory. Even
>> at 2B docs/shard, I'd be surprised to see it function at all. Don't
>> try sorting on a timestamp for instance.
> I haven't understood Wilburns setup completely, as it seems to me that he will quickly
run out of memory for starting new shards. But if we are looking at shards of 30GB and 160M
documents, 6GB sounds a lot better.
> Regards,
> Toke Eskildsen

View raw message