Can you run nodetool repair on all the nodes first and look at the keys?

On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar <suruchi.deodhar@generalsentiment.com> wrote:
Yes, the key distribution does vary across the nodes. For example, on the node with the highest data, Number of Keys (estimate) is 6527744 for a particular column family, whereas for the same column family on the node with least data, Number of Keys (estimate) = 3840.

Is there a way to control this distribution by setting some parameter of cassandra.

I am using the Murmur3 partitioner with NetworkTopologyStrategy.

Thanks,
Suruchi



On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
Can you check cfstats to see number of keys per node?


On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <suruchi.deodhar@generalsentiment.com> wrote:
Thanks for your replies. I wiped out my data from the cluster and also cleared the commitlog before restarting it with num_tokens=256. I then uploaded data using sstableloader.

However, I am still not able to see a uniform distribution of data across nodes of the clusters.

The output of the bin/nodetool -h localhost status commands looks like follows. Some nodes have data as low as 1.12MB while some have as high as 912.57 MB.

Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.238.133.174  856.66 MB  256     8.4%              e41d8863-ce37-4d5c-a428-bfacea432a35  1a
UN  10.238.133.97   439.02 MB  256     7.7%              1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
UN  10.151.86.146   1.05 GB    256     8.0%              8952645d-4a27-4670-afb2-65061c205734  1a
UN  10.138.10.9     912.57 MB  256     8.6%              25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
UN  10.87.87.240    70.85 MB   256     8.6%              ea066827-83bc-458c-83e8-bd15b7fc783c  1b
UN  10.93.5.157     60.56 MB   256     7.6%              4ab9111c-39b4-4d15-9401-359d9d853c16  1b
UN  10.92.231.170   866.73 MB  256     9.3%              a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
UN  10.238.137.250  533.77 MB  256     7.8%              84301648-afff-4f06-aa0b-4be421e0d08f  1a
UN  10.93.91.139    478.45 KB  256     8.1%              682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
UN  10.138.2.20     1.12 MB    256     7.9%              a6d4672a-0915-4c64-ba47-9f190abbf951  1a
UN  10.93.31.44     282.65 MB  256     7.8%              67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
UN  10.236.138.169  223.66 MB  256     9.1%              cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
UN  10.137.7.90     11.36 MB   256     7.4%              17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
UN  10.93.77.166    837.64 MB  256     8.8%              9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
UN  10.120.249.140  838.59 MB  256     9.4%              e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
UN  10.90.246.128   216.75 MB  256     8.4%              054911ec-969d-43d9-aea1-db445706e4d2  1b
UN  10.123.95.248   147.1 MB   256     7.2%              a17deca1-9644-4520-9e62-ac66fc6fef60  1b
UN  10.136.11.40    4.24 MB    256     8.5%              66be1173-b822-40b5-b650-cb38ae3c7a51  1a
UN  10.87.90.42     11.56 MB   256     8.0%              dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
UN  10.87.75.147    549 MB     256     8.3%              ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
UN  10.151.49.88    119.86 MB  256     8.9%              57043573-ab1b-4e3c-8044-58376f7ce08f  1a
UN  10.87.83.107    484.3 MB   256     8.3%              0019439b-9f8a-4965-91b8-7108bbb55593  1b
UN  10.137.20.183   137.67 MB  256     8.4%              15951592-8ab2-473d-920a-da6e9d99507d  1a
UN  10.238.170.159  49.17 MB   256     9.4%              32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

Is there something else that I should be doing differently?

Thanks for your help!

Suruchi



On Thu, Sep 19, 2013 at 3:20 PM, Richard Low <richard@wentnet.com> wrote:
The only thing you need to guarantee is that Cassandra doesn't start with num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the data before starting it with higher num_tokens.


On 19 September 2013 19:07, Robert Coli <rcoli@eventbrite.com> wrote:
On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <suruchi.deodhar@generalsentiment.com> wrote:
Do you suggest I should try with some other installation mechanism? Are there any known problems with the tar installation of cassandra 1.2.9 that I should be aware of? 

I was asking in the context of this JIRA :


Which does not seem to apply in your case!

=Rob