cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Safdar Kureishy <safdar.kurei...@gmail.com>
Subject RandomPartitioner is providing a very skewed distribution of keys across a 5-node Solandra cluster
Date Sun, 24 Jun 2012 15:00:24 GMT
Hi,

I've searched online but was unable to find any leads for the problem
below. This mailing list seemed the most appropriate place. Apologies in
advance if that isn't the case.

I'm running a 5-node Solandra cluster (Solr + Cassandra). I've setup the
nodes with tokens *evenly distributed across the token space*, for a 5-node
cluster (as evidenced below under the "effective-ownership" column of the
"nodetool ring" output). My data is a set of a few million crawled web
pages, crawled using Nutch, and also indexed using the "solrindex" command
available through Nutch. AFAIK, the key for each document generated from
the crawled data is the URL.

Based on the "load" values for the nodes below, despite adding about 3
million web pages to this index via the HTTP Rest API (e.g.:
http://9.9.9.x:8983/solandra/index/update....), some nodes are still
"empty". Specifically, nodes 9.9.9.1 and 9.9.9.3 have just a few kilobytes
(shown in *bold* below) of the index, while the remaining 3 nodes are
consistently getting hammered by all the data. If the RandomPartioner
(which is what I'm using for this cluster) is supposed to achieve an even
distribution of keys across the token space, why is it that the data below
is skewed in this fashion? Literally, no key was yet been hashed to the
nodes 9.9.9.1 and 9.9.9.3 below. Could someone possibly shed some light on
this absurdity?.

[me@hm1 solandra-app]$ bin/nodetool -h hm1 ring
Address         DC          Rack        Status State   Load
 Effective-Owership  Token

               136112946768375385385349842972707284580
9.9.9.0       datacenter1 rack1       Up     Normal  7.57 GB         20.00%
             0
9.9.9.1       datacenter1 rack1       Up     Normal  *21.44 KB*
 20.00%              34028236692093846346337460743176821145
9.9.9.2       datacenter1 rack1       Up     Normal  14.99 GB        20.00%
             68056473384187692692674921486353642290
9.9.9.3       datacenter1 rack1       Up     Normal  *50.79 KB*
 20.00%              102084710076281539039012382229530463435
9.9.9.4       datacenter1 rack1       Up     Normal  15.22 GB        20.00%
             136112946768375385385349842972707284580

Thanks in advance.

Regards,
Safdar

Mime
View raw message