Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B8F58F1BE for ; Fri, 5 Apr 2013 17:11:37 +0000 (UTC) Received: (qmail 60608 invoked by uid 500); 5 Apr 2013 17:11:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 60572 invoked by uid 500); 5 Apr 2013 17:11:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 60563 invoked by uid 99); 5 Apr 2013 17:11:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 17:11:35 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a83.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 17:11:28 +0000 Received: from homiemail-a83.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTP id AC19E5E063 for ; Fri, 5 Apr 2013 10:11:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=U91VYHScyN+++pYOTCblefk4px U=; b=dmHY5qK1dUKMu8GMN/bF7vPD9BHlKP/OPts4E9QURdk2od09jsp5h9pXc5 l8WvToEhalFkqRLt2NlNuBC2DKLiyMDYCAYYgmYwUMwOvJNW2ZsmCV9Umnsn79WQ ywG+L5favXUbFrcrBYuO0Fbth1hPDkR1Hti+C7XdcMABOEypI= Received: from [10.65.15.205] (unknown [59.164.97.108]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a83.g.dreamhost.com (Postfix) with ESMTPSA id C77655E056 for ; Fri, 5 Apr 2013 10:11:04 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_1A42EE32-EF79-4D2C-A861-3B243D6C6B05" Message-Id: <77719ECE-AF82-4073-86DA-D99EDD5579A7@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: nodetool status inconsistencies, repair performance and system keyspace compactions Date: Fri, 5 Apr 2013 22:41:02 +0530 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_1A42EE32-EF79-4D2C-A861-3B243D6C6B05 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 monitor the repair using nodetool compactionstats to see the merkle = trees being created, and nodetool netstats to see data streaming.=20 Also look in the logs for messages from AntiEntropyService.java , that = will tell you how long the node waited for each replica to get back to = it.=20 Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 4/04/2013, at 5:42 PM, Ond=C5=99ej =C4=8Cerno=C5=A1 = wrote: > Hi, >=20 > most has been resolved - the failed to uncompress error was really a > bug in cassandra (see > https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem > with different load reporting is a change between 1.2.1 (reports 100% > for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the > fraction. Is this correct? >=20 > Anyway, the nodetool repair still takes ages to finish, considering > only megabytes of not changing data are involved in my test: >=20 > [root@host:/etc/puppet] nodetool repair ks > [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 > ranges for keyspace ks > [2013-04-04 13:47:17,007] Repair session > 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range > (-2270395505556181001,-2268004533044804266] finished > ... > [2013-04-04 13:47:17,063] Repair session > 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range > (1069254279177813908,1070290707448386360] finished > [2013-04-04 13:47:17,063] Repair command #1 finished >=20 > This is the status before the repair (by the way, after the datacenter > has been bootstrapped from the remote one): >=20 > [root@host:/etc/puppet] nodetool status > Datacenter: us-east > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Status=3DUp/Down > |/ State=3DNormal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.xxx 5.74 MB 256 17.1% > 06ff8328-32a3-4196-a31f-1e0f608d0638 1d > UN xxx.xxx.xxx.xxx 5.73 MB 256 15.3% > 7a96bf16-e268-433a-9912-a0cf1668184e 1d > UN xxx.xxx.xxx.xxx 5.72 MB 256 17.5% > 67a68a2a-12a8-459d-9d18-221426646e84 1d > Datacenter: na-dev > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Status=3DUp/Down > |/ State=3DNormal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN xxx.xxx.xxx.xxx 5.74 MB 256 16.4% > eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 > UN xxx.xxx.xxx.xxx 5.74 MB 256 17.0% > cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 > UN xxx.xxx.xxx.xxx 5.74 MB 256 16.7% > 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 >=20 > Why does it take 20 minutes to finish? Fortunately the big number of > compactions I reported in the previous email was not triggered. >=20 > And is there a documentation where I could find the exact semantics of > repair when vnodes are used (and what -pr means in such a setup) and > when run in multiple datacenter setup? I still don't quite get it. >=20 > regards, > Ond=C5=99ej =C4=8Cerno=C5=A1 >=20 >=20 > On Thu, Mar 28, 2013 at 3:30 AM, aaron morton = wrote: >> During one of my tests - see this thread in this mailing list: >> = http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-I= OException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-= td7586494.html >>=20 >> That thread has been updated, check the bug ondrej created. >>=20 >> How will this perform in production with much bigger data if repair >> takes 25 minutes on 7MB and 11k compactions were triggered by the >> repair run? >>=20 >> Seems a little odd. >> See what happens the next time you run repair. >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >>=20 >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 27/03/2013, at 2:36 AM, Ond=C5=99ej =C4=8Cerno=C5=A1 = wrote: >>=20 >> Hi all, >>=20 >> I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads = and >> writes. >>=20 >> Currently I test various operational qualities of the setup. >>=20 >> During one of my tests - see this thread in this mailing list: >> = http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-I= OException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-= td7586494.html >> - I ran into this situation: >>=20 >> - all nodes have all data and agree on it: >>=20 >> [user@host1-dc1:~] nodetool status >>=20 >> Datacenter: na-prod >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Status=3DUp/Down >> |/ State=3DNormal/Leaving/Joining/Moving >> -- Address Load Tokens Owns >> (effective) Host ID Rack >> UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% >> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 >> UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% >> 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 >> UN XXX.XXX.XXX.XXX 7.72 MB 256 100.0% >> 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 >> Datacenter: us-east >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Status=3DUp/Down >> |/ State=3DNormal/Leaving/Joining/Moving >> -- Address Load Tokens Owns >> (effective) Host ID Rack >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> a336efae-8d9c-4562-8e2a-b766b479ecb4 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% >> f53fd294-16cc-497e-9613-347f07ac3850 1d >>=20 >> - only one node disagrees: >>=20 >> [user@host1-dc2:~] nodetool status >> Datacenter: us-east >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Status=3DUp/Down >> |/ State=3DNormal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host = ID >> Rack >> UN XXX.XXX.XXX.XXX 7.73 MB 256 17.6% >> a336efae-8d9c-4562-8e2a-b766b479ecb4 1d >> UN XXX.XXX.XXX.XXX 7.75 MB 256 16.4% >> ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d >> UN XXX.XXX.XXX.XXX 7.73 MB 256 15.7% >> f53fd294-16cc-497e-9613-347f07ac3850 1d >> Datacenter: na-prod >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Status=3DUp/Down >> |/ State=3DNormal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host = ID >> Rack >> UN XXX.XXX.XXX.XXX 7.74 MB 256 16.9% >> 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 >> UN XXX.XXX.XXX.XXX 7.72 MB 256 17.1% >> 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 >> UN XXX.XXX.XXX.XXX 7.73 MB 256 16.3% >> 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 >>=20 >> I tried to rebuild the node from scratch, repair the node, no = results. >> Still the same owns stats. >>=20 >> The cluster is built from cassandra 1.2.3 and uses vnodes. >>=20 >>=20 >> On the related note: the data size, as you can see, is really small. >> The cluster was created by setting up the us-east datacenter, >> populating it with the dataset, then building the na-prod datacenter >> and running nodetool rebuild us-east. When I tried to run nodetool >> repair it took 25 minutes to finish, on this small dataset. Is this >> ok? >>=20 >> One other think I notices is the amount of compactions on the system >> keyspace: >>=20 >> = /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TO= C.txt >> = /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-St= atistics.db >>=20 >> This is just after running the repair. Is this ok, considering the >> dataset is 7MB and during the repair no operations were running >> against the database, neither read, nor write, nothing? >>=20 >> How will this perform in production with much bigger data if repair >> takes 25 minutes on 7MB and 11k compactions were triggered by the >> repair run? >>=20 >> regards, >>=20 >> Ondrej Cernos >>=20 >>=20 --Apple-Mail=_1A42EE32-EF79-4D2C-A861-3B243D6C6B05 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
http://www.thelastpickle.com

On 4/04/2013, at 5:42 PM, Ond=C5=99ej =C4=8Cerno=C5=A1 = <cernoso@gmail.com> = wrote:

Hi,

most has been resolved - the failed to = uncompress error was really a
bug in cassandra (see
https://issu= es.apache.org/jira/browse/CASSANDRA-5391) and the problem
with = different load reporting is a change between 1.2.1 (reports 100%
for = 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports = the
fraction. Is this correct?

Anyway, the nodetool repair = still takes ages to finish, considering
only megabytes of not = changing data are involved in my test:

[root@host:/etc/puppet] = nodetool repair ks
[2013-04-04 13:26:46,618] Starting repair command = #1, repairing 1536
ranges for keyspace ks
[2013-04-04 = 13:47:17,007] Repair session
88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for = range
(-2270395505556181001,-2268004533044804266] = finished
...
[2013-04-04 13:47:17,063] Repair = session
65d31180-9d1d-11e2-a0a1-05b94e1385c7 for = range
(1069254279177813908,1070290707448386360] = finished
[2013-04-04 13:47:17,063] Repair command #1 = finished

This is the status before the repair (by the way, after = the datacenter
has been bootstrapped from the remote = one):

[root@host:/etc/puppet] nodetool status
Datacenter: = us-east
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
St= atus=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;     Load =       Tokens  Owns   Host = ID
=             &n= bsp;           &nbs= p;          Rack
UN =  xxx.xxx.xxx.xxx    5.74 MB    256 =     17.1%
06ff8328-32a3-4196-a31f-1e0f608d0638 =  1d
UN  xxx.xxx.xxx.xxx    5.73 MB =    256 =     15.3%
7a96bf16-e268-433a-9912-a0cf1668184e =  1d
UN  xxx.xxx.xxx.xxx    5.72 MB =    256 =     17.5%
67a68a2a-12a8-459d-9d18-221426646e84 =  1d
Datacenter: = na-dev
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Status= =3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;    Load =       Tokens  Owns   Host = ID
=             &n= bsp;           &nbs= p;            =  Rack
UN  xxx.xxx.xxx.xxx   5.74 MB =    256 =     16.4%
eb86aaae-ef0d-40aa-9b74-2b9704c77c0a =  cmp02
UN  xxx.xxx.xxx.xxx   5.74 MB =    256 =     17.0%
cd24af74-7f6a-4eaa-814f-62474b4e4df1 =  cmp01
UN  xxx.xxx.xxx.xxx   5.74 MB =    256 =     16.7%
1a55cfd4-bb30-4250-b868-a9ae13d81ae1 =  cmp05

Why does it take 20 minutes to finish? Fortunately = the big number of
compactions I reported in the previous email was = not triggered.

And is there a documentation where I could find = the exact semantics of
repair when vnodes are used (and what -pr = means in such a setup) and
when run in multiple datacenter setup? I = still don't quite get it.

regards,
Ond=C5=99ej = =C4=8Cerno=C5=A1


On Thu, Mar 28, 2013 at 3:30 AM, aaron = morton <aaron@thelastpickle.com> = wrote:
During one of my tests - see this = thread in this mailing list:
http://cassandra-user-incubator-apache-org.3065146= .n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-r= unning-nodetool-rebuild-td7586494.html

That thread has been = updated, check the bug ondrej created.

How will this perform in = production with much bigger data if repair
takes 25 minutes on 7MB = and 11k compactions were triggered by the
repair run?

Seems a = little odd.
See what happens the next time you run = repair.

Cheers

-----------------
Aaron = Morton
Freelance Cassandra Consultant
New = Zealand

@aaronmorton
http://www.thelastpickle.com

On = 27/03/2013, at 2:36 AM, Ond=C5=99ej =C4=8Cerno=C5=A1 = <cernoso@gmail.com> wrote:

Hi all,

I have 2 DCs, 3 = nodes each, RF:3, I use local quorum for both reads = and
writes.

Currently I test various operational qualities of = the setup.

During one of my tests - see this thread in this = mailing = list:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/= java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool= -rebuild-td7586494.html
- I ran into this situation:

- all = nodes have all data and agree on it:

[user@host1-dc1:~] nodetool = status

Datacenter: = na-prod
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
St= atus=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;          Load =         Tokens =  Owns
(effective)  Host ID =             &n= bsp;           &nbs= p;            =       Rack
UN  XXX.XXX.XXX.XXX =   7.74 MB    256 =     100.0%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 =  cmp17
UN  XXX.XXX.XXX.XXX   7.74 MB =    256 =     100.0%
039f206e-da22-44b5-83bd-2513f96ddeac =  cmp10
UN  XXX.XXX.XXX.XXX   7.72 MB =    256 =     100.0%
007097e9-17e6-43f7-8dfc-37b082a784c4 =  cmp11
Datacenter: = us-east
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
St= atus=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;          Load =         Tokens =  Owns
(effective)  Host ID =             &n= bsp;           &nbs= p;            =       Rack
UN  XXX.XXX.XXX.XXX =    7.73 MB    256 =     100.0%
a336efae-8d9c-4562-8e2a-b766b479ecb4 =  1d
UN  XXX.XXX.XXX.XXX    7.73 MB =    256 =     100.0%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e =  1d
UN  XXX.XXX.XXX.XXX     7.73 MB =    256 =     100.0%
f53fd294-16cc-497e-9613-347f07ac3850 =  1d

- only one node disagrees:

[user@host1-dc2:~] = nodetool status
Datacenter: = us-east
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
St= atus=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;           Load=       Tokens   Owns =   Host ID
=             &n= bsp;           &nbs= p;            =        Rack
UN =  XXX.XXX.XXX.XXX    7.73 MB    256 =     17.6%
a336efae-8d9c-4562-8e2a-b766b479ecb4 =  1d
UN  XXX.XXX.XXX.XXX    7.75 MB =    256 =     16.4%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e =  1d
UN  XXX.XXX.XXX.XXX     7.73 MB =    256 =     15.7%
f53fd294-16cc-497e-9613-347f07ac3850 =  1d
Datacenter: = na-prod
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
St= atus=3DUp/Down
|/ State=3DNormal/Leaving/Joining/Moving
-- =  Address =             &n= bsp;           Load=       Tokens   Owns =   Host ID
=             &n= bsp;           &nbs= p;            =        Rack
UN =  XXX.XXX.XXX.XXX   7.74 MB    256 =     16.9%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 =  cmp17
UN  XXX.XXX.XXX.XXX   7.72 MB =    256 =     17.1%
007097e9-17e6-43f7-8dfc-37b082a784c4 =  cmp11
UN  XXX.XXX.XXX.XXX   7.73 MB =    256 =     16.3%
039f206e-da22-44b5-83bd-2513f96ddeac =  cmp10

I tried to rebuild the node from scratch, repair the = node, no results.
Still the same owns stats.

The cluster is = built from cassandra 1.2.3 and uses vnodes.


On the related = note: the data size, as you can see, is really small.
The cluster was = created by setting up the us-east datacenter,
populating it with the = dataset, then building the na-prod datacenter
and running nodetool = rebuild us-east. When I tried to run nodetool
repair it took 25 = minutes to finish, on this small dataset. Is this
ok?

One = other think I notices is the amount of compactions on the = system
keyspace:

/.../system/schema_columnfamilies/system-schema= _columnfamilies-ib-11694-TOC.txt
/.../system/schema_columnfamilies/syst= em-schema_columnfamilies-ib-11693-Statistics.db

This is just = after running the repair. Is this ok, considering the
dataset is 7MB = and during the repair no operations were running
against the = database, neither read, nor write, nothing?

How will this perform = in production with much bigger data if repair
takes 25 minutes on 7MB = and 11k compactions were triggered by the
repair = run?

regards,

Ondrej = Cernos



= --Apple-Mail=_1A42EE32-EF79-4D2C-A861-3B243D6C6B05--