"I see a lot of hinted handoff compactions too."

I might have not been clear enough, I see a lot of "compaction of system.hints" that I interpret as being due to a lot of data that couldn't reach their destination.


2013/6/4 Alain RODRIGUEZ <arodrime@gmail.com>
Hi,

I have an issue since switch to multiple DC. I use AWS EC2 instances, C*1.2.2, 12 nodes eu-west + 6 nodes us-east (new DC).

Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Owns   Host ID
UN  public ip      133.43 GB  8.3%   ae33d60c-1c24-4c10-b58c-59d06faac5ca
UN  public ip      171.3 GB   8.3%   bb94c428-c98d-454d-af80-6612548a8125
UN  public ip      140.26 GB  8.3%   136bbced-25ed-4a37-abd9-7ab0d146d1c7
UN  public ip      132.14 GB  8.3%   086ebf3e-c58f-4b76-b4d5-6600f7b79cf7
UN  public ip      178.26 GB  8.3%   9255d30f-848f-4251-800b-2c61b4e0cfbf
UN  public ip      153.79 GB  8.3%   7b4fd83a-ca9c-4115-b146-222ab040abd6
UN  public ip      146.82 GB  8.3%   bf233d59-d7a4-482f-adaf-d48531d16305
UN  public ip      151.1 GB   8.3%   fa3b617d-5d31-4db2-87bf-494ee8a9f95f
UN  public ip      131.78 GB  8.3%   dac399dc-ac7c-4ee3-9503-f55e8a9f1675
UN  public ip      130.18 GB  8.3%   56b8654a-f8b3-43d4-8b15-2e74d5dfe81b
UN  public ip     161.96 GB  8.3%   97624d02-ba48-42e7-88f7-2d3b0175d6ef
UN  public ip     130.26 GB  8.3%   868c45b3-4afc-43db-b2d0-5c0f89d018fb
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Owns   Host ID
UN  public ip        246.74 GB  0.0%   212888f6-ecf8-4953-8f83-c5653fb176cb
UN  public ip        320.15 GB  0.0%   bcd696da-433b-4e6b-8030-11629eaf5b84
UN  public ip        353.22 GB  0.0%   3f5cb04a-3ac3-46f3-b101-31a9ae7682bc
UN  public ip        348.91 GB  0.0%   836b3b76-418a-4a22-bab4-c1a0bd49de65
UN  public ip        269.37 GB  0.0%   9408c7ff-ec47-4824-af81-92aa311a1984
UN  public ip        244.94 GB  0.0%   668eb3ca-8ee4-40ae-98e7-987c471bd675

On each node of the new DC, owns 0% (from status view). A nodetool ring myks gives me:

Datacenter: eu-west
==========
Replicas: 3

Address         Rack        Status State   Load            Owns                Token
public ip    1b          Up     Normal  131.78 GB       25.00%              113427455640312821154458202477256070485
public ip    1b          Up     Normal  161.96 GB       25.00%              141784319550391026443072753096570088106
public ip    1b          Up     Normal  153.43 GB       25.00%              70892159775195513221536376548285044053
public ip    1b          Up     Normal  151.1 GB        25.00%              99249023685273718510150927167599061674
public ip    1b          Up     Normal  130.26 GB       25.00%              155962751505430129087380028406227096917
public ip    1b          Up     Normal  146.82 GB       25.00%              85070591730234615865843651857942052864
public ip    1b          Up     Normal  171.35 GB       25.00%              14178431955039102644307275309657008810
public ip    1b          Up     Normal  132.14 GB       25.00%              42535295865117307932921825928971026432
public ip    1b          Up     Normal  140.26 GB       25.00%              28356863910078205288614550619314017621
public ip    1b          Up     Normal  133.43 GB       25.00%              0
public ip    1b          Up     Normal  130.18 GB       25.00%              127605887595351923798765477786913079296
public ip    1b          Up     Normal  178.27 GB       25.00%              56713727820156410577229101238628035242

Datacenter: us-east
==========
Replicas: 3

Address         Rack        Status State   Load            Owns                Token
                                                                               100
public ip   1b          Up     Normal  320.15 GB       50.00%              28356863910078205288614550619314017721
public ip   1b          Up     Normal  353.14 GB       50.00%              56713727820156410577229101238628035342
public ip   1b          Up     Normal  348.35 GB       50.00%              85070591730234615865843651857942052964
public ip   1b          Up     Normal  269.35 GB       50.00%              113427455640312821154458202477256070585
public ip   1b          Up     Normal  244.94 GB       50.00%              141784319550391026443072753096570088206
public ip   1b          Up     Normal  246.74 GB       50.00%              100

This seems to be ok.

When I run "describe cluster;" from cassandra-cli from an eu-west node :

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
        e968865b-3b96-3c87-af0a-6294067a832f: [My 18 publics ip]

So far so good.
From an us-east node now :

[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
        UNREACHABLE: [public ip of the node itself]

        e968865b-3b96-3c87-af0a-6294067a832f: [17 others publics ip]


Why isn't this node not able to see itself ? What port / service is in used while describing cluster ? I have tried opening all port with no success. Also tried the following script to help the node finding itself, but it doesn't seems to work...

--------------------- script ---------------------------------------------------------------------------------------
#!/bin/bash
/sbin/ifconfig eth0:1 $PUBLIC_IP netmask 255.255.255.255 broadcast $PUBLIC_IP

--------------------- end of script --------------------------------------------------------------------------------------

eth0:1    Link encap:Ethernet  HWaddr 12:31:39:22:c1:41
          inet addr:xx.xx.xx.xx  Bcast:xx.xx.xx.xx  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:47


I see a lot of hinted handoff compactions too.

Any clue on what's wrong ?