incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Safdar Kureishy <safdar.kurei...@gmail.com>
Subject Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster
Date Mon, 25 Jun 2012 20:26:27 GMT
Got it. Thanks Jake. Will do.

Safdar


On Mon, Jun 25, 2012 at 4:16 PM, Jake Luciani <jakers@gmail.com> wrote:

> Hi Sarfar,
>
> Yes you should make it a multiple.  The issue is each shard 'sticks' to a
> given node but there is no way to guarantee  5 random keys will equally
> distribute across 5 nodes.  The idea is eventually they will as you add
> more and more keys.  So increasing shards at once can make that happen
> faster.  You can change this parameter and restart the nodes without
> affecting your old data.
>
> If you have more issues raise it on the github issue tab for Solandra.
>
> -Jake
>
> On Mon, Jun 25, 2012 at 2:23 AM, Safdar Kureishy <
> safdar.kureishy@gmail.com> wrote:
>
>> Hi Jake,
>>
>> Thanks. Yes, I forgot to mention also that I had raised the
>> solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should
>> I have raised it to 10 or 15 (multiple of 5)? I have added all the
>> documents that I needed to the index now. It appears the distribution
>> became more even at a later stage, after indexing 12 million Nutch
>> documents. The distribution is now 35G / 35G / 56G / 324M / 51G, but there
>> is still one node that has a small fraction (i.e 324M) of what the other
>> nodes have. In addition, some nodes also have about double the data as
>> others (e.g., 56G vs 35G). If you think that increasing
>> solandra.shards.at.once param will further improve the distribution, what
>> would I need to do to enforce that change when the cluster is running, now
>> that all the data has already been added to the index? And on the flip
>> side, if the change cannot be made for existing data, what would happen (to
>> existing + new data) if the setting was changed and the servers were
>> restarted?
>>
>> Lastly, is there another mailing list I should be using for Solandra
>> questions? I couldn't find one....
>>
>> Thanks,
>> Safdar
>>
>>
>>
>>
>>
>> On Mon, Jun 25, 2012 at 4:16 AM, Jake Luciani <jakers@gmail.com> wrote:
>>
>>> Hi Safdar,
>>>
>>> If you want to get better utilization of the cluster raise the
>>> solandra.shards.at.once param in solandra.properties
>>>
>>> -Jake
>>>
>>>
>>>
>>> On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy <
>>> safdar.kureishy@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've searched online but was unable to find any leads for the problem
>>>> below. This mailing list seemed the most appropriate place. Apologies in
>>>> advance if that isn't the case.
>>>>
>>>> I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup
>>>> the nodes with tokens *evenly distributed across the token space*, for
>>>> a 5-node cluster (as evidenced below under the "effective-ownership" column
>>>> of the "nodetool ring" output). My data is a set of a few million crawled
>>>> web pages, crawled using Nutch, and also indexed using the "solrindex"
>>>> command available through Nutch. AFAIK, the key for each document generated
>>>> from the crawled data is the URL.
>>>>
>>>> Based on the "load" values for the nodes below, despite adding about 3
>>>> million web pages to this index via the HTTP Rest API (e.g.:
>>>> http://9.9.9.x:8983/solandra/index/update....), some nodes are still
>>>> "empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
>>>> (shown in *bold* below) of the index, while the remaining 3 nodes are
>>>> consistently getting hammered by all the data. If the RandomPartioner
>>>> (which is what I'm using for this cluster) is supposed to achieve an even
>>>> distribution of keys across the token space, why is it that the data below
>>>> is skewed in this fashion? Literally, no key was yet been hashed to the
>>>> nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
>>>> this absurdity?.
>>>>
>>>> [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
>>>> Address         DC          Rack        Status State   Load
>>>>  Effective-Owership  Token
>>>>
>>>>                    136112946768375385385349842972707284580
>>>> 9.9.9.0       datacenter1 rack1       Up     Normal  7.57 GB
>>>> 20.00%              0
>>>> 9.9.9.1       datacenter1 rack1       Up     Normal  *21.44 KB*
>>>>  20.00%              34028236692093846346337460743176821145
>>>> 9.9.9.2       datacenter1 rack1       Up     Normal  14.99 GB
>>>>  20.00%              68056473384187692692674921486353642290
>>>> 9.9.9.3       datacenter1 rack1       Up     Normal  *50.79 KB*
>>>>  20.00%              102084710076281539039012382229530463435
>>>> 9.9.9.4       datacenter1 rack1       Up     Normal  15.22 GB
>>>>  20.00%              136112946768375385385349842972707284580
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards,
>>>> Safdar
>>>>
>>>
>>>
>>>
>>> --
>>> http://twitter.com/tjake
>>>
>>
>>
>
>
> --
> http://twitter.com/tjake
>

Mime
View raw message