cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Data Distribution in Table/Column Family
Date Thu, 27 Aug 2015 15:13:02 GMT
Even if the data were absolutely evenly distributed, that won't guarantee
that the hash values of the partition keys used in your client queries
won't collide a result in a hotspot.

Another possibility is that your data is not partitioned well at the
primary key level. Are you using clustering keys? Only the partition key
portion of the primary key is used to produce the hash/token value that
selects the node. Sometimes you need to use composite partition keys to
assure that primary keys will be better distributed for particular access
patterns.

-- Jack Krupansky

On Thu, Aug 27, 2015 at 11:03 AM, Alain RODRIGUEZ <arodrime@gmail.com>
wrote:

> Hi,
>
> Did you try to run the following on all your nodes and compare ?
>
> du -sh /*whatever*/cassandra/data/*
>
> Of course if you have unequal snapshots sizes remove them in the above
> command (or directly remove them).
>
> This should answer (barely) your question about an eventual even
> distribution (/!\ having a few MB or GB deviation - depending on your total
> data size - might happen without this being a real issue, I would say up to
> 5-15 % on a big enough dataset)
>
> Also, "nodetool cfstats" give you an approximation of the number of rows
> and the space used (to run on each node) among other useful informations.
>
> But the main thing to do is to double check your tables model to see if
> your workflow could create a hotspot on any of those, you should be able to
> guess if one of your table is badly distributed imho.
>
> C*heers,
>
> Alain
>
> 2015-08-27 15:43 GMT+02:00 Saladi Naidu <naidusp2002@yahoo.com>:
>
>> Is there a way to find out how data is distributed within column family
>> by each node? Nodetool provides how data is distributed across nodes that
>> only shows all the data by node. We are seeing heavy load on one node and I
>> suspect that partitioning is not distributing data equally. But to prove
>> that to development team we need to know the stats for that table
>>
>> Naidu Saladi
>>
>
>

Mime
View raw message