Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
In-Reply-To: <CBF3CDDD.3210%luke.hospadaruk@ithaka.org>
To: user@cassandra.apache.org
Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE Nodes not picking up data on repair, disk loaded unevenly
MIME-Version: 1.0
From: Samuel CARRIERE <samuel.carriere@urssaf.fr>
Message-ID: 
 <OF621C8E89.39BE1F41-ONC1257A14.006BCADD-C1257A14.006BEC35@urssaf.fr>
Date: Tue, 5 Jun 2012 21:38:43 +0200
Content-Type: multipart/alternative;
 boundary="=_alternative 006BEC17C1257A14_="

Message en plusieurs parties au format MIME
--=_alternative 006BEC17C1257A14_=
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,

To verify that the repair was successful, you can look for this kind of=20
messages in the log :
 INFO [AntiEntropyStage:1] 2012-05-19 00:57:52,351 AntiEntropyService.java =

(line 762) [repair #e46a0a90-a13c-11e1-0000-596f3d333ab7] UsersCF is fully =

synced (3 remaining column family to sync for this session)
...
 INFO [AntiEntropyStage:1] 2012-05-19 00:59:25,348 AntiEntropyService.java =

(line 762) [repair #e46a0a90-a13c-11e1-0000-596f3d333ab7] MyOtherCF is=20
fully synced (2 remaining column family to sync for this session)
...

To verify that one node "really" has the data it is supposed to have,=20
well, you could isolate it from the rest of the cluster, and query the=20
data (with thrift) with CL ONE.

Regards,
Samuel


Luke Hospadaruk <Luke.Hospadaruk@ithaka.org>=20
05/06/2012 20:53
Veuillez r=E9pondre =E0
user@cassandra.apache.org


A
"user@cassandra.apache.org" <user@cassandra.apache.org>
cc

Objet
Nodes not picking up data on repair, disk loaded unevenly


I have a 4-node cluster with one keyspace (aside from the system keyspace)
with the replication factor set to 4.  The disk usage between the nodes is
pretty wildly different and I'm wondering why.  It's becoming a problem
because one node is getting to the point where it sometimes fails to
compact because it doesn't have enough space.

I've been doing a lot of experimenting with the schema, adding/dropping
things, changing settings around (not ideal I realize, but we're still in
development).

In an ideal world, I'd launch another cluster (this is all hosted in
amazon), copy all the data to that, and just get rid of my current
cluster, but the current cluster is in use by some other parties so
rebuilding everything is impractical (although possible if it's the only
reliable solution).

$ nodetool -h localhost ring
Address     DC        Rack  Status State  Load       Owns   Token
=20

1.xx.xx.xx   Cassandra   rack1       Up     Normal  837.8 GB   25.00%  0
=20
2.xx.xx.xx   Cassandra   rack1       Up     Normal  1.17 TB    25.00%
42535295865117307932921825928971026432
3.xx.xx.xx   Cassandra   rack1       Up     Normal  977.23 GB  25.00%
85070591730234615865843651857942052864
4.xx.xx.xx   Cassandra   rack1       Up     Normal  291.2 GB   25.00%
127605887595351923798765477786913079296

-Problems I'm having:
Nodes are running out of space and are apparently unable to perform
compactions because of it.  These machines have 1.7T total space each.

The logs for node #2 have a lot of warnings about insufficient space for
compaction.  Node number 4 was so extremely out of space (cassandra was
failing to start because of it)that I removed all the SSTables for one of
the less essential column families just to bring it back online.


I have (since I started noticing these issues) enabled compression for all
my column families.  On node #1 I was able to successfully run a scrub and
major compaction, so I suspect that the disk usage for node #1 is about
where all the other nodes should be.  At ~840GB I'm probably running close
to the max load I should have on a node, so I may need to launch more
nodes into the cluster, but I'd like to get things straightened out before
I introduce more potential issues (token moving, etc).

Node #4 seems not to be picking up all the data it should have (since
repication factor =3D=3D number of nodes, the load should be roughly the
same?).  I've run repairs on that node to seemingly no avail (after repair
finishes, it still has about the same disk usage, which is much too low).


-What I think the solution should be:
One node at a time:
1) nodetool drain the node
2) shut down cassandra on the node
3) wipe out all the data in my keyspace on the node
4) bring cassandra back up
5) nodetool repair

-My concern:
This is basically what I did with node #4 (although I didn't drain, and I
didn't wipe the entire keyspace), and it doesn't seem to have regained all
the data it's supposed to have after the repair. The column family should
have at least 200-300GB of data, and the SSTables in the data directory
only total about 11GB, am I missing something?

Is there a way to verify that a node =5Freally=5F has all the data it's
supposed to have?

I don't want to do this process to each node and discover at the end of it
that I've lost a ton of data.

Is there something I should be looking for in the logs to verify that the
repair was successful?  If I do a 'nodetool netstats' during the repair I
don't see any streams going in or out of node #4.

Thanks,
Luke


--=_alternative 006BEC17C1257A14_=
Content-Type: text/html; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable


<br><font size=3D2 face=3D"sans-serif">Hi,</font>
<br>
<br><font size=3D2 face=3D"sans-serif">To verify that the repair was succes=
sful,
you can look for this kind of messages in the log :</font>
<br><font size=3D2 face=3D"sans-serif">&nbsp;INFO [AntiEntropyStage:1] 2012=
-05-19
00:57:52,351 AntiEntropyService.java (line 762) [repair #e46a0a90-a13c-11e1=
-0000-596f3d333ab7]
UsersCF is fully synced (3 remaining column family to sync for this session=
)</font>
<br><font size=3D2 face=3D"sans-serif">...</font>
<br><font size=3D2 face=3D"sans-serif">&nbsp;INFO [AntiEntropyStage:1] 2012=
-05-19
00:59:25,348 AntiEntropyService.java (line 762) [repair #e46a0a90-a13c-11e1=
-0000-596f3d333ab7]
MyOtherCF is fully synced (2 remaining column family to sync for this sessi=
on)</font>
<br><font size=3D2 face=3D"sans-serif">...</font>
<br>
<br><font size=3D2 face=3D"sans-serif">To verify that one node &quot;really=
&quot;
has the data it is supposed to have, well, you could isolate it from the
rest of the cluster, and query the data (with thrift) with CL ONE.</font>
<br>
<br><font size=3D2 face=3D"sans-serif">Regards,</font>
<br><font size=3D2 face=3D"sans-serif">Samuel</font>
<br>
<br>
<br>
<br>
<table width=3D100%>
<tr valign=3Dtop>
<td width=3D40%><font size=3D1 face=3D"sans-serif"><b>Luke Hospadaruk &lt;L=
uke.Hospadaruk@ithaka.org&gt;</b>
</font>
<p><font size=3D1 face=3D"sans-serif">05/06/2012 20:53</font>
<table border>
<tr valign=3Dtop>
<td bgcolor=3Dwhite>
<div align=3Dcenter><font size=3D1 face=3D"sans-serif">Veuillez r=E9pondre =
=E0<br>
user@cassandra.apache.org</font></div></table>
<br>
<td width=3D59%>
<table width=3D100%>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">A</font></div>
<td><font size=3D1 face=3D"sans-serif">&quot;user@cassandra.apache.org&quot;
&lt;user@cassandra.apache.org&gt;</font>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">cc</font></div>
<td>
<tr valign=3Dtop>
<td>
<div align=3Dright><font size=3D1 face=3D"sans-serif">Objet</font></div>
<td><font size=3D1 face=3D"sans-serif">Nodes not picking up data on repair,
disk loaded unevenly</font></table>
<br>
<table>
<tr valign=3Dtop>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=3D2><tt>I have a 4-node cluster with one keyspace (aside from
the system keyspace)<br>
with the replication factor set to 4. &nbsp;The disk usage between the
nodes is<br>
pretty wildly different and I'm wondering why. &nbsp;It's becoming a proble=
m<br>
because one node is getting to the point where it sometimes fails to<br>
compact because it doesn't have enough space.<br>
<br>
I've been doing a lot of experimenting with the schema, adding/dropping<br>
things, changing settings around (not ideal I realize, but we're still
in<br>
development).<br>
<br>
In an ideal world, I'd launch another cluster (this is all hosted in<br>
amazon), copy all the data to that, and just get rid of my current<br>
cluster, but the current cluster is in use by some other parties so<br>
rebuilding everything is impractical (although possible if it's the only<br>
reliable solution).<br>
<br>
$ nodetool -h localhost ring<br>
Address &nbsp; &nbsp; DC &nbsp; &nbsp; &nbsp; &nbsp;Rack &nbsp;Status State
&nbsp;Load &nbsp; &nbsp; &nbsp; Owns &nbsp; Token<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br>
<br>
1.xx.xx.xx &nbsp; Cassandra &nbsp; rack1 &nbsp; &nbsp; &nbsp; Up &nbsp;
&nbsp; Normal &nbsp;837.8 GB &nbsp; 25.00% &nbsp;0<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br>
2.xx.xx.xx &nbsp; Cassandra &nbsp; rack1 &nbsp; &nbsp; &nbsp; Up &nbsp;
&nbsp; Normal &nbsp;1.17 TB &nbsp; &nbsp;25.00%<br>
42535295865117307932921825928971026432<br>
3.xx.xx.xx &nbsp; Cassandra &nbsp; rack1 &nbsp; &nbsp; &nbsp; Up &nbsp;
&nbsp; Normal &nbsp;977.23 GB &nbsp;25.00%<br>
85070591730234615865843651857942052864<br>
4.xx.xx.xx &nbsp; Cassandra &nbsp; rack1 &nbsp; &nbsp; &nbsp; Up &nbsp;
&nbsp; Normal &nbsp;291.2 GB &nbsp; 25.00%<br>
127605887595351923798765477786913079296<br>
<br>
-Problems I'm having:<br>
Nodes are running out of space and are apparently unable to perform<br>
compactions because of it. &nbsp;These machines have 1.7T total space each.=
<br>
<br>
The logs for node #2 have a lot of warnings about insufficient space for<br>
compaction. &nbsp;Node number 4 was so extremely out of space (cassandra
was<br>
failing to start because of it)that I removed all the SSTables for one
of<br>
the less essential column families just to bring it back online.<br>
<br>
<br>
I have (since I started noticing these issues) enabled compression for
all<br>
my column families. &nbsp;On node #1 I was able to successfully run a scrub
and<br>
major compaction, so I suspect that the disk usage for node #1 is about<br>
where all the other nodes should be. &nbsp;At ~840GB I'm probably running
close<br>
to the max load I should have on a node, so I may need to launch more<br>
nodes into the cluster, but I'd like to get things straightened out before<=
br>
I introduce more potential issues (token moving, etc).<br>
<br>
Node #4 seems not to be picking up all the data it should have (since<br>
repication factor =3D=3D number of nodes, the load should be roughly the<br>
same?). &nbsp;I've run repairs on that node to seemingly no avail (after
repair<br>
finishes, it still has about the same disk usage, which is much too low).<b=
r>
<br>
<br>
-What I think the solution should be:<br>
One node at a time:<br>
1) nodetool drain the node<br>
2) shut down cassandra on the node<br>
3) wipe out all the data in my keyspace on the node<br>
4) bring cassandra back up<br>
5) nodetool repair<br>
<br>
-My concern:<br>
This is basically what I did with node #4 (although I didn't drain, and
I<br>
didn't wipe the entire keyspace), and it doesn't seem to have regained
all<br>
the data it's supposed to have after the repair. The column family should<b=
r>
have at least 200-300GB of data, and the SSTables in the data directory<br>
only total about 11GB, am I missing something?<br>
<br>
Is there a way to verify that a node =5Freally=5F has all the data it's<br>
supposed to have?<br>
<br>
I don't want to do this process to each node and discover at the end of
it<br>
that I've lost a ton of data.<br>
<br>
Is there something I should be looking for in the logs to verify that the<b=
r>
repair was successful? &nbsp;If I do a 'nodetool netstats' during the repair
I<br>
don't see any streams going in or out of node #4.<br>
<br>
Thanks,<br>
Luke<br>
<br>
</tt></font>
<br>
--=_alternative 006BEC17C1257A14_=--