cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Steinbach <marcel.steinb...@chors.de>
Subject Re: Unbalanced cluster with RandomPartitioner
Date Fri, 20 Jan 2012 09:32:24 GMT
On 19.01.2012, at 20:15, Narendra Sharma wrote:
> I believe you need to move the nodes on the ring. What was the load on the nodes before
you added 5 new nodes? Its just that you are getting data in certain token range more than
others.
With three nodes, it was also imbalanced. 

What I don't understand is, why the md5 sums would generate such massive hot spots. 

Most of our keys look like that: 
00013270494972450001234567
with the first 16 digits being a timestamp of one of our application server's startup times,
and the last 10 digits being sequentially generated per user. 

There may be a lot of keys that start with e.g. "0001327049497245"  (or some other time stamp).
But I was under the impression that md5 doesn't bother and generates uniform distribution?
But then again, I know next to nothing about md5. Maybe someone else has a better insight
to the algorithm?

However, we also use cfs with a date ("yyyymmdd") as key, as well as cfs with uuids as keys.
And those cfs in itself are not balanced either. E.g. node 5 has 12 GB live space used in
the cf the uuid as key, and node 8 only 428MB. 

Cheers,
Marcel

> 
> On Thu, Jan 19, 2012 at 3:22 AM, Marcel Steinbach <marcel.steinbach@chors.de> wrote:
> On 18.01.2012, at 02:19, Maki Watanabe wrote:
>> Are there any significant difference of number of sstables on each nodes?
> No, no significant difference there. Actually, node 8 is among those with more sstables
but with the least load (20GB)
> 
> On 17.01.2012, at 20:14, Jeremiah Jordan wrote:
>> Are you deleting data or using TTL's?  Expired/deleted data won't go away until the
sstable holding it is compacted.  So if compaction has happened on some nodes, but not on
others, you will see this.  The disparity is pretty big 400Gb to 20GB, so this probably isn't
the issue, but with our data using TTL's if I run major compactions a couple times on that
column family it can shrink ~30%-40%.
> Yes, we do delete data. But I agree, the disparity is too big to blame only the deletions.

> 
> Also, initially, we started out with 3 nodes and upgraded to 8 a few weeks ago. After
adding the node, we did
> compactions and cleanups and didn't have a balanced cluster. So that should have removed
outdated data, right?
> 
>> 2012/1/18 Marcel Steinbach <marcel.steinbach@chors.de>:
>>> We are running regular repairs, so I don't think that's the problem.
>>> And the data dir sizes match approx. the load from the nodetool.
>>> Thanks for the advise, though.
>>> 
>>> Our keys are digits only, and all contain a few zeros at the same
>>> offsets. I'm not that familiar with the md5 algorithm, but I doubt that it
>>> would generate 'hotspots' for those kind of keys, right?
>>> 
>>> On 17.01.2012, at 17:34, Mohit Anchlia wrote:
>>> 
>>> Have you tried running repair first on each node? Also, verify using
>>> df -h on the data dirs
>>> 
>>> On Tue, Jan 17, 2012 at 7:34 AM, Marcel Steinbach
>>> <marcel.steinbach@chors.de> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> we're using RP and have each node assigned the same amount of the token
>>> space. The cluster looks like that:
>>> 
>>> 
>>> Address         Status State   Load            Owns    Token
>>> 
>>> 
>>> 205648943402372032879374446248852460236
>>> 
>>> 1       Up     Normal  310.83 GB       12.50%
>>>  56775407874461455114148055497453867724
>>> 
>>> 2       Up     Normal  470.24 GB       12.50%
>>>  78043055807020109080608968461939380940
>>> 
>>> 3       Up     Normal  271.57 GB       12.50%
>>>  99310703739578763047069881426424894156
>>> 
>>> 4       Up     Normal  282.61 GB       12.50%
>>>  120578351672137417013530794390910407372
>>> 
>>> 5       Up     Normal  248.76 GB       12.50%
>>>  141845999604696070979991707355395920588
>>> 
>>> 6       Up     Normal  164.12 GB       12.50%
>>>  163113647537254724946452620319881433804
>>> 
>>> 7       Up     Normal  76.23 GB        12.50%
>>>  184381295469813378912913533284366947020
>>> 
>>> 8       Up     Normal  19.79 GB        12.50%
>>>  205648943402372032879374446248852460236
>>> 
>>> 
>>> I was under the impression, the RP would distribute the load more evenly.
>>> 
>>> Our row sizes are 0,5-1 KB, hence, we don't store huge rows on a single
>>> node. Should we just move the nodes so that the load is more even
>>> distributed, or is there something off that needs to be fixed first?
>>> 
>>> 
>>> Thanks
>>> 
>>> Marcel
>>> 
>>> <hr style="border-color:blue">
>>> 
>>> <p>chors GmbH
>>> 
>>> <br><hr style="border-color:blue">
>>> 
>>> <p>specialists in digital and direct marketing solutions<br>
>>> 
>>> Haid-und-Neu-Stra├če 7<br>
>>> 
>>> 76131 Karlsruhe, Germany<br>
>>> 
>>> www.chors.com</p>
>>> 
>>> <p>Managing Directors: Dr. Volker Hatz, Markus Plattner<br>Amtsgericht
>>> Montabaur, HRB 15029</p>
>>> 
>>> <p style="font-size:9px">This e-mail is for the intended recipient only
and
>>> may contain confidential or privileged information. If you have received
>>> this e-mail by mistake, please contact us immediately and completely delete
>>> it (and any attachments) and do not forward it or inform any other person of
>>> its contents. If you send us messages by e-mail, we take this as your
>>> authorization to correspond with you by e-mail. E-mail transmission cannot
>>> be guaranteed to be secure or error-free as information could be
>>> intercepted, amended, corrupted, lost, destroyed, arrive late or incomplete,
>>> or contain viruses. Neither chors GmbH nor the sender accept liability for
>>> any errors or omissions in the content of this message which arise as a
>>> result of its e-mail transmission. Please note that all e-mail
>>> communications to and from chors GmbH may be monitored.</p>
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> w3m
> 
> 
> 
> 
> -- 
> Narendra Sharma
> Software Engineer
> http://www.aeris.com
> http://narendrasharma.blogspot.com/
> 
> 
 
Mime
View raw message