cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster
Date Mon, 25 Jun 2012 13:16:43 GMT
Hi Sarfar,

Yes you should make it a multiple.  The issue is each shard 'sticks' to a
given node but there is no way to guarantee  5 random keys will equally
distribute across 5 nodes.  The idea is eventually they will as you add
more and more keys.  So increasing shards at once can make that happen
faster.  You can change this parameter and restart the nodes without
affecting your old data.

If you have more issues raise it on the github issue tab for Solandra.

-Jake

On Mon, Jun 25, 2012 at 2:23 AM, Safdar Kureishy
<safdar.kureishy@gmail.com>wrote:

> Hi Jake,
>
> Thanks. Yes, I forgot to mention also that I had raised the
> solandra.shards.at.once param from 4 to 5 (to match the # of nodes). Should
> I have raised it to 10 or 15 (multiple of 5)? I have added all the
> documents that I needed to the index now. It appears the distribution
> became more even at a later stage, after indexing 12 million Nutch
> documents. The distribution is now 35G / 35G / 56G / 324M / 51G, but there
> is still one node that has a small fraction (i.e 324M) of what the other
> nodes have. In addition, some nodes also have about double the data as
> others (e.g., 56G vs 35G). If you think that increasing
> solandra.shards.at.once param will further improve the distribution, what
> would I need to do to enforce that change when the cluster is running, now
> that all the data has already been added to the index? And on the flip
> side, if the change cannot be made for existing data, what would happen (to
> existing + new data) if the setting was changed and the servers were
> restarted?
>
> Lastly, is there another mailing list I should be using for Solandra
> questions? I couldn't find one....
>
> Thanks,
> Safdar
>
>
>
>
>
> On Mon, Jun 25, 2012 at 4:16 AM, Jake Luciani <jakers@gmail.com> wrote:
>
>> Hi Safdar,
>>
>> If you want to get better utilization of the cluster raise the
>> solandra.shards.at.once param in solandra.properties
>>
>> -Jake
>>
>>
>>
>> On Sun, Jun 24, 2012 at 11:00 AM, Safdar Kureishy <
>> safdar.kureishy@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've searched online but was unable to find any leads for the problem
>>> below. This mailing list seemed the most appropriate place. Apologies in
>>> advance if that isn't the case.
>>>
>>> I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
>>> nodes with tokens *evenly distributed across the token space*, for a
>>> 5-node cluster (as evidenced below under the "effective-ownership" column
>>> of the "nodetool ring" output). My data is a set of a few million crawled
>>> web pages, crawled using Nutch, and also indexed using the "solrindex"
>>> command available through Nutch. AFAIK, the key for each document generated
>>> from the crawled data is the URL.
>>>
>>> Based on the "load" values for the nodes below, despite adding about 3
>>> million web pages to this index via the HTTP Rest API (e.g.:
>>> http://9.9.9.x:8983/solandra/index/update....), some nodes are still
>>> "empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
>>> (shown in *bold* below) of the index, while the remaining 3 nodes are
>>> consistently getting hammered by all the data. If the RandomPartioner
>>> (which is what I'm using for this cluster) is supposed to achieve an even
>>> distribution of keys across the token space, why is it that the data below
>>> is skewed in this fashion? Literally, no key was yet been hashed to the
>>> nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
>>> this absurdity?.
>>>
>>> [me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
>>> Address         DC          Rack        Status State   Load
>>>  Effective-Owership  Token
>>>
>>>                    136112946768375385385349842972707284580
>>> 9.9.9.0       datacenter1 rack1       Up     Normal  7.57 GB
>>> 20.00%              0
>>> 9.9.9.1       datacenter1 rack1       Up     Normal  *21.44 KB*
>>>  20.00%              34028236692093846346337460743176821145
>>> 9.9.9.2       datacenter1 rack1       Up     Normal  14.99 GB
>>>  20.00%              68056473384187692692674921486353642290
>>> 9.9.9.3       datacenter1 rack1       Up     Normal  *50.79 KB*
>>>  20.00%              102084710076281539039012382229530463435
>>> 9.9.9.4       datacenter1 rack1       Up     Normal  15.22 GB
>>>  20.00%              136112946768375385385349842972707284580
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Safdar
>>>
>>
>>
>>
>> --
>> http://twitter.com/tjake
>>
>
>


-- 
http://twitter.com/tjake

Mime
View raw message