lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Joyner <mich...@newsrx.com>
Subject Re: Shard size variation
Date Wed, 02 May 2018 16:22:20 GMT
The main reason we go this route is that after awhile (with default 
settings) we end up with hundreds of shards and performance of course 
drops abysmally as a result. By using a stepped optimize a) we don't run 
into the we need the 3x+ head room issue, b) optimize performance 
penalty during optimize is less than the hundreds of shards not being 
optimized performance penalty.

BTW, as we use batched a batch insert/update cycle [once daily] we only 
do optimize to a segment of 1 after a complete batch has been run. 
Though during the batch we reduce segment counts down to a max of 16 
every 250K insert/updates to prevent the large segment count performance 
penalty.


On 04/30/2018 07:10 PM, Erick Erickson wrote:
> There's really no good way to purge deleted documents from the index
> other than to wait until merging happens.
>
> Optimize/forceMerge and expungeDeletes both suffer from the problem
> that they create massive segments that then stick around for a very
> long time, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
>
> Best,
> Erick
>
> On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <michael@newsrx.com> wrote:
>> Based on experience, 2x head room is room is not always enough, sometimes
>> not even 3x, if you are optimizing from many segments down to 1 segment in a
>> single go.
>>
>> We have however figured out a way that can work with as little as 51% free
>> space via the following iteration cycle:
>>
>> public void solrOptimize() {
>>          int initialMaxSegments = 256;
>>          int finalMaxSegments = 1;
>>          if (isShowSegmentCounter()) {
>>              log.info("Optimizing ...");
>>          }
>>          try (SolrClient solrServerInstance = getSolrClientInstance()){
>>              for (int segments=initialMaxSegments;
>> segments>=finalMaxSegments; segments--) {
>>                  if (isShowSegmentCounter()) {
>>                      System.out.println("Optimizing to a max of "+segments+"
>> segments.");
>>                  }
>>                  solrServerInstance.optimize(true, true, segments);
>>              }
>>          } catch (SolrServerException | IOException e) {
>>              throw new RuntimeException(e);
>>
>>          }
>>      }
>>
>>
>> On 04/30/2018 04:23 PM, Walter Underwood wrote:
>>> You need 2X the minimum index size in disk space anyway, so don’t worry
>>> about keeping the indexes as small as possible. Worry about having enough
>>> headroom.
>>>
>>> If your indexes are 250 GB, you need 250 GB of free space.
>>>
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>> On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugustus@gmail.com> wrote:
>>>>
>>>> Thanks Erick/Deepak.
>>>>
>>>> The cloud is running on baremetal (128 GB/24 cpu).
>>>>
>>>> Is there an option to run a compact on the data files to make the size
>>>> equal on both the clouds? I am trying find all the options before I add
>>>> the
>>>> new fields into the production cloud.
>>>>
>>>> Thanks
>>>> AA
>>>>
>>>> On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
>>>> <erickerickson@gmail.com>
>>>> wrote:
>>>>
>>>>> Anthony:
>>>>>
>>>>> You are probably seeing the results of removing deleted documents from
>>>>> the shards as they're merged. Even on replicas in the same _shard_,
>>>>> the size of the index on disk won't necessarily be identical. This has
>>>>> to do with which segments are selected for merging, which are not
>>>>> necessarily coordinated across replicas.
>>>>>
>>>>> The test is if the number of docs on each collection is the same. If
>>>>> it is, then don't worry about index sizes.
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deicool@gmail.com>
wrote:
>>>>>> Could you please also give the machine details of the two clouds
you
>>>>>> are
>>>>>> running?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Deepak
>>>>>> "The greatness of a nation can be judged by the way its animals are
>>>>>> treated. Please stop cruelty to Animals, become a Vegan"
>>>>>>
>>>>>> +91 73500 12833
>>>>>> deicool@gmail.com
>>>>>>
>>>>>> Facebook: https://www.facebook.com/deicool
>>>>>> LinkedIn: www.linkedin.com/in/deicool
>>>>>>
>>>>>> "Plant a Tree, Go Green"
>>>>>>
>>>>>> Make In India : http://www.makeinindia.com/home
>>>>>>
>>>>>> On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugustus@gmail.com>
>>>>> wrote:
>>>>>>> Hi Shawn,
>>>>>>>
>>>>>>> The cloud is running version 6.2.1. with ClassicIndexSchemaFactory
>>>>>>>
>>>>>>> The sum of size from admin UI on all the shards is around 265
G vs 224
>>>>>>> G
>>>>>>> between the two clouds.
>>>>>>>
>>>>>>> I created the collection using "numShards" so compositeId router.
>>>>>>>
>>>>>>> If you need more information, please let me know.
>>>>>>>
>>>>>>> Thanks
>>>>>>> AA
>>>>>>>
>>>>>>> On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apache@elyograg.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 4/30/2018 9:51 AM, Antony A wrote:
>>>>>>>>
>>>>>>>>> I am running two separate solr clouds. I have 8 shards
in each with
>>>>>>>>> a
>>>>>>>>> total
>>>>>>>>> of 300 million documents. Both the clouds are indexing
the document
>>>>> from
>>>>>>>>> the same source/configuration.
>>>>>>>>>
>>>>>>>>> I am noticing there is a difference in the size of the
collection
>>>>>>> between
>>>>>>>>> them. I am planning to add more shards to see if that
helps solve
>>>>>>>>> the
>>>>>>>>> issue. Has anyone come across similar issue?
>>>>>>>>>
>>>>>>>> There's no information here about exactly what you are seeing,
what
>>>>> you
>>>>>>>> are expecting to see, and why you believe that what you are
seeing is
>>>>>>> wrong.
>>>>>>>> You did say that there is "a difference in size".  That is
a very
>>>>> vague
>>>>>>>> problem description.
>>>>>>>>
>>>>>>>> FYI, unless a SolrCloud collection is using the implicit
router, you
>>>>>>>> cannot add shards.  And if it *IS* using the implicit router,
then
>>>>>>>> you
>>>>>>> are
>>>>>>>> 100% in control of document routing -- Solr cannot influence
that at
>>>>> all.
>>>>>>>> Thanks,
>>>>>>>> Shawn
>>>>>>>>
>>>>>>>>


Mime
View raw message