lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony A <antonyaugus...@gmail.com>
Subject Re: Shard size variation
Date Mon, 30 Apr 2018 23:31:34 GMT
Thank you all. I have around 70% free space in production. I will compute for the additional
fields.


Sent from my mobile. Please excuse any typos.

> On Apr 30, 2018, at 5:10 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> There's really no good way to purge deleted documents from the index
> other than to wait until merging happens.
> 
> Optimize/forceMerge and expungeDeletes both suffer from the problem
> that they create massive segments that then stick around for a very
> long time, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> 
> Best,
> Erick
> 
>> On Mon, Apr 30, 2018 at 1:56 PM, Michael Joyner <michael@newsrx.com> wrote:
>> Based on experience, 2x head room is room is not always enough, sometimes
>> not even 3x, if you are optimizing from many segments down to 1 segment in a
>> single go.
>> 
>> We have however figured out a way that can work with as little as 51% free
>> space via the following iteration cycle:
>> 
>> public void solrOptimize() {
>>        int initialMaxSegments = 256;
>>        int finalMaxSegments = 1;
>>        if (isShowSegmentCounter()) {
>>            log.info("Optimizing ...");
>>        }
>>        try (SolrClient solrServerInstance = getSolrClientInstance()){
>>            for (int segments=initialMaxSegments;
>> segments>=finalMaxSegments; segments--) {
>>                if (isShowSegmentCounter()) {
>>                    System.out.println("Optimizing to a max of "+segments+"
>> segments.");
>>                }
>>                solrServerInstance.optimize(true, true, segments);
>>            }
>>        } catch (SolrServerException | IOException e) {
>>            throw new RuntimeException(e);
>> 
>>        }
>>    }
>> 
>> 
>>> On 04/30/2018 04:23 PM, Walter Underwood wrote:
>>> 
>>> You need 2X the minimum index size in disk space anyway, so don’t worry
>>> about keeping the indexes as small as possible. Worry about having enough
>>> headroom.
>>> 
>>> If your indexes are 250 GB, you need 250 GB of free space.
>>> 
>>> wunder
>>> Walter Underwood
>>> wunder@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>>> On Apr 30, 2018, at 1:13 PM, Antony A <antonyaugustus@gmail.com> wrote:
>>>> 
>>>> Thanks Erick/Deepak.
>>>> 
>>>> The cloud is running on baremetal (128 GB/24 cpu).
>>>> 
>>>> Is there an option to run a compact on the data files to make the size
>>>> equal on both the clouds? I am trying find all the options before I add
>>>> the
>>>> new fields into the production cloud.
>>>> 
>>>> Thanks
>>>> AA
>>>> 
>>>> On Mon, Apr 30, 2018 at 10:45 AM, Erick Erickson
>>>> <erickerickson@gmail.com>
>>>> wrote:
>>>> 
>>>>> Anthony:
>>>>> 
>>>>> You are probably seeing the results of removing deleted documents from
>>>>> the shards as they're merged. Even on replicas in the same _shard_,
>>>>> the size of the index on disk won't necessarily be identical. This has
>>>>> to do with which segments are selected for merging, which are not
>>>>> necessarily coordinated across replicas.
>>>>> 
>>>>> The test is if the number of docs on each collection is the same. If
>>>>> it is, then don't worry about index sizes.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>>> On Mon, Apr 30, 2018 at 9:38 AM, Deepak Goel <deicool@gmail.com>
wrote:
>>>>>> 
>>>>>> Could you please also give the machine details of the two clouds
you
>>>>>> are
>>>>>> running?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Deepak
>>>>>> "The greatness of a nation can be judged by the way its animals are
>>>>>> treated. Please stop cruelty to Animals, become a Vegan"
>>>>>> 
>>>>>> +91 73500 12833
>>>>>> deicool@gmail.com
>>>>>> 
>>>>>> Facebook: https://www.facebook.com/deicool
>>>>>> LinkedIn: www.linkedin.com/in/deicool
>>>>>> 
>>>>>> "Plant a Tree, Go Green"
>>>>>> 
>>>>>> Make In India : http://www.makeinindia.com/home
>>>>>> 
>>>>>> On Mon, Apr 30, 2018 at 9:51 PM, Antony A <antonyaugustus@gmail.com>
>>>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Shawn,
>>>>>>> 
>>>>>>> The cloud is running version 6.2.1. with ClassicIndexSchemaFactory
>>>>>>> 
>>>>>>> The sum of size from admin UI on all the shards is around 265
G vs 224
>>>>>>> G
>>>>>>> between the two clouds.
>>>>>>> 
>>>>>>> I created the collection using "numShards" so compositeId router.
>>>>>>> 
>>>>>>> If you need more information, please let me know.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> AA
>>>>>>> 
>>>>>>> On Mon, Apr 30, 2018 at 10:04 AM, Shawn Heisey <apache@elyograg.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> On 4/30/2018 9:51 AM, Antony A wrote:
>>>>>>>>> 
>>>>>>>>> I am running two separate solr clouds. I have 8 shards
in each with
>>>>>>>>> a
>>>>>>>>> total
>>>>>>>>> of 300 million documents. Both the clouds are indexing
the document
>>>>> 
>>>>> from
>>>>>>>>> 
>>>>>>>>> the same source/configuration.
>>>>>>>>> 
>>>>>>>>> I am noticing there is a difference in the size of the
collection
>>>>>>> 
>>>>>>> between
>>>>>>>>> 
>>>>>>>>> them. I am planning to add more shards to see if that
helps solve
>>>>>>>>> the
>>>>>>>>> issue. Has anyone come across similar issue?
>>>>>>>>> 
>>>>>>>> There's no information here about exactly what you are seeing,
what
>>>>> 
>>>>> you
>>>>>>>> 
>>>>>>>> are expecting to see, and why you believe that what you are
seeing is
>>>>>>> 
>>>>>>> wrong.
>>>>>>>> 
>>>>>>>> You did say that there is "a difference in size".  That is
a very
>>>>> 
>>>>> vague
>>>>>>>> 
>>>>>>>> problem description.
>>>>>>>> 
>>>>>>>> FYI, unless a SolrCloud collection is using the implicit
router, you
>>>>>>>> cannot add shards.  And if it *IS* using the implicit router,
then
>>>>>>>> you
>>>>>>> 
>>>>>>> are
>>>>>>>> 
>>>>>>>> 100% in control of document routing -- Solr cannot influence
that at
>>>>> 
>>>>> all.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Shawn
>>>>>>>> 
>>>>>>>> 
>>> 
>> 

Mime
View raw message