lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Henry <>
Subject Re: Handling growth
Date Thu, 20 Nov 2014 00:56:41 GMT

Interesting, I'm still unfamiliar with limitations (if any) of aliasing.
Does architecture utilize realtime get?
On Nov 18, 2014 11:49 AM, "Michael Della Bitta" <> wrote:

> We're achieving some success by treating aliases as collections and
> collections as shards.
> More specifically, there's a read alias that spans all the collections,
> and a write alias that points at the 'latest' collection. Every week, I
> create a new collection, add it to the read alias, and point the write
> alias at it.
> Michael
> On 11/14/14 07:06, Toke Eskildsen wrote:
>> Patrick Henry [] wrote:
>>  I am working with a Solr collection that is several terabytes in size
>>> over
>>> several hundred millions of documents.  Each document is very rich, and
>>> over the past few years we have consistently quadrupled the size our
>>> collection annually.  Unfortunately, this sits on a single node with
>>> only a
>>> few hundred megabytes of memory - so our performance is less than ideal.
>> I assume you mean gigabytes of memory. If you have not already done so,
>> switching to SSDs for storage should buy you some more time.
>>  [Going for SolrCloud]  We are in a continuous adding documents and never
>>> change
>>> existing ones.  Based on that, one individual recommended for me to
>>> implement custom hashing and route the latest documents to the shard with
>>> the least documents, and when that shard fills up add a new shard and
>>> index
>>> on the new shard, rinse and repeat.
>> We have quite a similar setup, where we produce a never-changing shard
>> once every 8 days and add it to our cloud. One could also combine this
>> setup with a single live shard, for keeping the full index constantly up to
>> date. The memory overhead of running an immutable shard is smaller than a
>> mutable one and easier to fine-tune. It also allows you to optimize the
>> index down to a single segment, which requires a bit less processing power
>> and saves memory when faceting. There's a description of our setup at
>>  From an administrative point of view, we like having complete control
>> over each shard. We keep track of what goes in it and in case of schema or
>> analyze chain changes, we can re-build each shard one at a time and deploy
>> them continuously, instead of having to re-build everything in one go on a
>> parallel setup. Of course, fundamental changes to the schema would require
>> a complete re-build before deploy, so we hope to avoid that.
>> - Toke Eskildsen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message