Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D9DDE91F7 for ; Tue, 5 Jun 2012 19:39:14 +0000 (UTC) Received: (qmail 80537 invoked by uid 500); 5 Jun 2012 19:39:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80508 invoked by uid 500); 5 Jun 2012 19:39:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80499 invoked by uid 99); 5 Jun 2012 19:39:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 19:39:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [91.209.25.138] (HELO cer69mx21.cirtil.fr) (91.209.25.138) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 19:39:07 +0000 X-PJ: In-Reply-To: To: user@cassandra.apache.org Cc: "user@cassandra.apache.org" Subject: RE Nodes not picking up data on repair, disk loaded unevenly MIME-Version: 1.0 From: Samuel CARRIERE Message-ID: Date: Tue, 5 Jun 2012 21:38:43 +0200 Content-Type: multipart/alternative; boundary="=_alternative 006BEC17C1257A14_=" X-Virus-Checked: Checked by ClamAV on apache.org Message en plusieurs parties au format MIME --=_alternative 006BEC17C1257A14_= Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Hi, To verify that the repair was successful, you can look for this kind of=20 messages in the log : INFO [AntiEntropyStage:1] 2012-05-19 00:57:52,351 AntiEntropyService.java = (line 762) [repair #e46a0a90-a13c-11e1-0000-596f3d333ab7] UsersCF is fully = synced (3 remaining column family to sync for this session) ... INFO [AntiEntropyStage:1] 2012-05-19 00:59:25,348 AntiEntropyService.java = (line 762) [repair #e46a0a90-a13c-11e1-0000-596f3d333ab7] MyOtherCF is=20 fully synced (2 remaining column family to sync for this session) ... To verify that one node "really" has the data it is supposed to have,=20 well, you could isolate it from the rest of the cluster, and query the=20 data (with thrift) with CL ONE. Regards, Samuel Luke Hospadaruk =20 05/06/2012 20:53 Veuillez r=E9pondre =E0 user@cassandra.apache.org A "user@cassandra.apache.org" cc Objet Nodes not picking up data on repair, disk loaded unevenly I have a 4-node cluster with one keyspace (aside from the system keyspace) with the replication factor set to 4. The disk usage between the nodes is pretty wildly different and I'm wondering why. It's becoming a problem because one node is getting to the point where it sometimes fails to compact because it doesn't have enough space. I've been doing a lot of experimenting with the schema, adding/dropping things, changing settings around (not ideal I realize, but we're still in development). In an ideal world, I'd launch another cluster (this is all hosted in amazon), copy all the data to that, and just get rid of my current cluster, but the current cluster is in use by some other parties so rebuilding everything is impractical (although possible if it's the only reliable solution). $ nodetool -h localhost ring Address DC Rack Status State Load Owns Token =20 1.xx.xx.xx Cassandra rack1 Up Normal 837.8 GB 25.00% 0 =20 2.xx.xx.xx Cassandra rack1 Up Normal 1.17 TB 25.00% 42535295865117307932921825928971026432 3.xx.xx.xx Cassandra rack1 Up Normal 977.23 GB 25.00% 85070591730234615865843651857942052864 4.xx.xx.xx Cassandra rack1 Up Normal 291.2 GB 25.00% 127605887595351923798765477786913079296 -Problems I'm having: Nodes are running out of space and are apparently unable to perform compactions because of it. These machines have 1.7T total space each. The logs for node #2 have a lot of warnings about insufficient space for compaction. Node number 4 was so extremely out of space (cassandra was failing to start because of it)that I removed all the SSTables for one of the less essential column families just to bring it back online. I have (since I started noticing these issues) enabled compression for all my column families. On node #1 I was able to successfully run a scrub and major compaction, so I suspect that the disk usage for node #1 is about where all the other nodes should be. At ~840GB I'm probably running close to the max load I should have on a node, so I may need to launch more nodes into the cluster, but I'd like to get things straightened out before I introduce more potential issues (token moving, etc). Node #4 seems not to be picking up all the data it should have (since repication factor =3D=3D number of nodes, the load should be roughly the same?). I've run repairs on that node to seemingly no avail (after repair finishes, it still has about the same disk usage, which is much too low). -What I think the solution should be: One node at a time: 1) nodetool drain the node 2) shut down cassandra on the node 3) wipe out all the data in my keyspace on the node 4) bring cassandra back up 5) nodetool repair -My concern: This is basically what I did with node #4 (although I didn't drain, and I didn't wipe the entire keyspace), and it doesn't seem to have regained all the data it's supposed to have after the repair. The column family should have at least 200-300GB of data, and the SSTables in the data directory only total about 11GB, am I missing something? Is there a way to verify that a node =5Freally=5F has all the data it's supposed to have? I don't want to do this process to each node and discover at the end of it that I've lost a ton of data. Is there something I should be looking for in the logs to verify that the repair was successful? If I do a 'nodetool netstats' during the repair I don't see any streams going in or out of node #4. Thanks, Luke --=_alternative 006BEC17C1257A14_= Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable
Hi,

To verify that the repair was succes= sful, you can look for this kind of messages in the log :
 INFO [AntiEntropyStage:1] 2012= -05-19 00:57:52,351 AntiEntropyService.java (line 762) [repair #e46a0a90-a13c-11e1= -0000-596f3d333ab7] UsersCF is fully synced (3 remaining column family to sync for this session= )
...
 INFO [AntiEntropyStage:1] 2012= -05-19 00:59:25,348 AntiEntropyService.java (line 762) [repair #e46a0a90-a13c-11e1= -0000-596f3d333ab7] MyOtherCF is fully synced (2 remaining column family to sync for this sessi= on)
...

To verify that one node "really= " has the data it is supposed to have, well, you could isolate it from the rest of the cluster, and query the data (with thrift) with CL ONE.

Regards,
Samuel



Luke Hospadaruk <L= uke.Hospadaruk@ithaka.org>

05/06/2012 20:53
Veuillez r=E9pondre = =E0
user@cassandra.apache.org

A
"user@cassandra.apache.org" <user@cassandra.apache.org>
cc
Objet
Nodes not picking up data on repair, disk loaded unevenly





I have a 4-node cluster with one keyspace (aside from the system keyspace)
with the replication factor set to 4.  The disk usage between the nodes is
pretty wildly different and I'm wondering why.  It's becoming a proble= m
because one node is getting to the point where it sometimes fails to
compact because it doesn't have enough space.

I've been doing a lot of experimenting with the schema, adding/dropping
things, changing settings around (not ideal I realize, but we're still in
development).

In an ideal world, I'd launch another cluster (this is all hosted in
amazon), copy all the data to that, and just get rid of my current
cluster, but the current cluster is in use by some other parties so
rebuilding everything is impractical (although possible if it's the only
reliable solution).

$ nodetool -h localhost ring
Address     DC        Rack  Status State  Load       Owns   Token
                 

1.xx.xx.xx   Cassandra   rack1       Up     Normal  837.8 GB   25.00%  0
                 
2.xx.xx.xx   Cassandra   rack1       Up     Normal  1.17 TB    25.00%
42535295865117307932921825928971026432
3.xx.xx.xx   Cassandra   rack1       Up     Normal  977.23 GB  25.00%
85070591730234615865843651857942052864
4.xx.xx.xx   Cassandra   rack1       Up     Normal  291.2 GB   25.00%
127605887595351923798765477786913079296

-Problems I'm having:
Nodes are running out of space and are apparently unable to perform
compactions because of it.  These machines have 1.7T total space each.=

The logs for node #2 have a lot of warnings about insufficient space for
compaction.  Node number 4 was so extremely out of space (cassandra was
failing to start because of it)that I removed all the SSTables for one of
the less essential column families just to bring it back online.


I have (since I started noticing these issues) enabled compression for all
my column families.  On node #1 I was able to successfully run a scrub and
major compaction, so I suspect that the disk usage for node #1 is about
where all the other nodes should be.  At ~840GB I'm probably running close
to the max load I should have on a node, so I may need to launch more
nodes into the cluster, but I'd like to get things straightened out before<= br> I introduce more potential issues (token moving, etc).

Node #4 seems not to be picking up all the data it should have (since
repication factor =3D=3D number of nodes, the load should be roughly the
same?).  I've run repairs on that node to seemingly no avail (after repair
finishes, it still has about the same disk usage, which is much too low).

-What I think the solution should be:
One node at a time:
1) nodetool drain the node
2) shut down cassandra on the node
3) wipe out all the data in my keyspace on the node
4) bring cassandra back up
5) nodetool repair

-My concern:
This is basically what I did with node #4 (although I didn't drain, and I
didn't wipe the entire keyspace), and it doesn't seem to have regained all
the data it's supposed to have after the repair. The column family should have at least 200-300GB of data, and the SSTables in the data directory
only total about 11GB, am I missing something?

Is there a way to verify that a node =5Freally=5F has all the data it's
supposed to have?

I don't want to do this process to each node and discover at the end of it
that I've lost a ton of data.

Is there something I should be looking for in the logs to verify that the repair was successful?  If I do a 'nodetool netstats' during the repair I
don't see any streams going in or out of node #4.

Thanks,
Luke


--=_alternative 006BEC17C1257A14_=--