Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A492997FF for ; Wed, 14 Dec 2011 13:02:22 +0000 (UTC) Received: (qmail 36260 invoked by uid 500); 14 Dec 2011 13:02:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 36231 invoked by uid 500); 14 Dec 2011 13:02:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36223 invoked by uid 99); 14 Dec 2011 13:02:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 13:02:19 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [92.60.177.132] (HELO web1.alefhost.od.ua) (92.60.177.132) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 13:02:11 +0000 Received: from com.flipdogsolutions (unknown [78.26.128.183]) by web1.alefhost.od.ua (Postfix) with ESMTPSA id 1577D25070 for ; Wed, 14 Dec 2011 15:01:58 +0200 (EET) Date: Wed, 14 Dec 2011 15:02:09 +0200 (GMT+02:00) From: igor@4friends.od.ua To: user@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_0_1224722352.1323867730296" ------=_Part_0_1224722352.1323867730296 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Do you use randompartitiner? What nodetool getendpoints show for several random keys? -----Original Message----- From: Bart Swedrowski To: user@cassandra.apache.org Sent: Wed, 14 Dec 2011 12:56 Subject: Re: One ColumnFamily places data on only 3 out of 4 nodes Anyone? On 12 December 2011 15:25, Bart Swedrowski wrote: > Hello everyone, > > I seem to have came across rather weird (at least for me!) problem / > behaviour with Cassandra. > > I am running a 4-nodes cluster on Cassandra 0.8.7. For the keyspace in > question, I have RF=3, SimpleStrategy with multiple ColumnFamilies inside > the KeySpace. On of the ColumnFamilies however seems to have data > distributed across only 3 out of 4 nodes. > > The data on the cluster beside the problematic ColumnFamily seems to be > more or less equal and even. > > # nodetool -h localhost ring > Address DC Rack Status State Load > Owns Token > > 127605887595351923798765477786913079296 > 192.168.81.2 datacenter1 rack1 Up Normal 7.27 GB > 25.00% 0 > 192.168.81.3 datacenter1 rack1 Up Normal 7.74 GB > 25.00% 42535295865117307932921825928971026432 > 192.168.81.4 datacenter1 rack1 Up Normal 7.38 GB > 25.00% 85070591730234615865843651857942052864 > 192.168.81.5 datacenter1 rack1 Up Normal 7.32 GB > 25.00% 127605887595351923798765477786913079296 > > Schema for the relevant bits of the keyspace is as follows: > > [default@A] show schema; > create keyspace A > with placement_strategy = 'SimpleStrategy' > and strategy_options = [{replication_factor : 3}]; > [...] > create column family UserDetails > with column_type = 'Standard' > and comparator = 'IntegerType' > and default_validation_class = 'BytesType' > and key_validation_class = 'BytesType' > and memtable_operations = 0.571875 > and memtable_throughput = 122 > and memtable_flush_after = 1440 > and rows_cached = 0.0 > and row_cache_save_period = 0 > and keys_cached = 200000.0 > and key_cache_save_period = 14400 > and read_repair_chance = 1.0 > and gc_grace = 864000 > and min_compaction_threshold = 4 > and max_compaction_threshold = 32 > and replicate_on_write = true > and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'; > > And now the symptoms - output of 'nodetool -h localhost cfstats' on each > node. Please note the figures on node1. > > *node1* > Column Family: UserDetails > SSTable count: 0 > Space used (live): 0 > Space used (total): 0 > Number of Keys (estimate): 0 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 0 > Read Count: 0 > Read Latency: NaN ms. > Write Count: 0 > Write Latency: NaN ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 0 > Key cache hit rate: NaN > Row cache: disabled > Compacted row minimum size: 0 > Compacted row maximum size: 0 > Compacted row mean size: 0 > > *node2* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112952788 > Space used (total): 164953743 > Number of Keys (estimate): 384 > Memtable Columns Count: 159419 > Memtable Data Size: 74910890 > Memtable Switch Count: 59 > Read Count: 135307426 > Read Latency: 25.900 ms. > Write Count: 3474673 > Write Latency: 0.040 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 120 > Key cache hit rate: 0.999971684189041 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node3* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 112953137 > Space used (total): 112953137 > Number of Keys (estimate): 384 > Memtable Columns Count: 159421 > Memtable Data Size: 74693445 > Memtable Switch Count: 56 > Read Count: 135304486 > Read Latency: 25.552 ms. > Write Count: 3474616 > Write Latency: 0.036 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 109 > Key cache hit rate: 0.9999716840888175 > Row cache: disabled > Compacted row minimum size: 42511 > Compacted row maximum size: 74975550 > Compacted row mean size: 42364305 > > *node4* > Column Family: UserDetails > SSTable count: 3 > Space used (live): 117070926 > Space used (total): 119479484 > Number of Keys (estimate): 384 > Memtable Columns Count: 159979 > Memtable Data Size: 75029672 > Memtable Switch Count: 60 > Read Count: 135294878 > Read Latency: 19.455 ms. > Write Count: 3474982 > Write Latency: 0.028 ms. > Pending Tasks: 0 > Key cache capacity: 200000 > Key cache size: 119 > Key cache hit rate: 0.9999752235777154 > Row cache: disabled > Compacted row minimum size: 2346800 > Compacted row maximum size: 62479625 > Compacted row mean size: 42591803 > > When I go to 'data' directory on node1 there is no files regarding the > UserDetails ColumnFamily. > > I tried performing manual repair in hope it will heal the situation, > however without any luck. > > # nodetool -h localhost repair A UserDetails > INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges. > INFO 15:19:54,647 Sending AEService tree for # manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, > (A,UserDetails), > (85070591730234615865843651857942052864,127605887595351923798765477786913079296]> > INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are > consistent for UserDetails on > (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are > consistent for UserDetails on > (85070591730234615865843651857942052864,127605887595351923798765477786913079296] > INFO 15:19:54,751 Repair session > manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec (on cfs > [Ljava.lang.String;@3491507b, range > (85070591730234615865843651857942052864,127605887595351923798765477786913079296]) > completed successfully > INFO 15:19:54,816 Sending AEService tree for # manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, > (A,UserDetails), > (42535295865117307932921825928971026432,85070591730234615865843651857942052864]> > INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are > consistent for UserDetails on > (42535295865117307932921825928971026432,85070591730234615865843651857942052864] > INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are > consistent for UserDetails on > (42535295865117307932921825928971026432,85070591730234615865843651857942052864] > INFO 15:19:54,874 Repair session > manual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd (on cfs > [Ljava.lang.String;@7e541d08, range > (42535295865117307932921825928971026432,85070591730234615865843651857942052864]) > completed successfully > INFO 15:19:54,909 Sending AEService tree for # manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, > (A,UserDetails), (127605887595351923798765477786913079296,0]> > INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are > consistent for UserDetails on (127605887595351923798765477786913079296,0] > INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are > consistent for UserDetails on (127605887595351923798765477786913079296,0] > INFO 15:19:54,975 Repair session > manual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243 (on cfs > [Ljava.lang.String;@48c651f2, range > (127605887595351923798765477786913079296,0]) completed successfully > INFO 15:19:54,975 Repair command #8 completed successfully > > As I am using SimpleStrategy I would expect the keys to be split, more or > less, equally across the nodes, however this don't seem to be the case. > > Has anyone came across similar behaviour before? Has anyone have any > suggestions what I could do to bring some data into node1? Obviously, this > kind of data split means node2, node3 and node4 need to do all the read > work which is not ideal. > > Any suggestions much appreciated. > > Kind regards, > Bart > > ------=_Part_0_1224722352.1323867730296 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

Do you use randompartitiner? What nodetool getendpoints show for se= veral random keys?



-----Original Message-----
From: Bart Swedrowski <bart@tim= edout.org>
To: user@cassandra.apache.org
Sent: Wed, 14 Dec 2011 = 12:56
Subject: Re: One ColumnFamily places data on only 3 out of 4 node= s

Anyone?

On 12 December 2= 011 15:25, Bart Swedrowski <bart@timedout.org> wrote:
Hello everyone,

I seem to have came across ra= ther weird (at least for me!) problem / behaviour with Cassandra.

I am running a 4-nodes cluster on Cassandra 0.8.7. =C2=A0Fo= r the keyspace in question, I have RF=3D3, SimpleStrategy with multiple Col= umnFamilies inside the KeySpace. =C2=A0On of the ColumnFamilies however see= ms to have data distributed across only 3 out of 4 nodes.

The data on the cluster beside the problematic ColumnFa= mily seems to be more or less equal and even.

# nodetool -h localhost ring
Address =C2=A0 =C2=A0 =C2=A0 =C2=A0= DC =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Rack =C2=A0 =C2=A0 =C2=A0 =C2=A0Statu= s State =C2=A0 Load =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Owns =C2=A0 = =C2=A0Token =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01276058875953= 51923798765477786913079296 =C2=A0 =C2=A0=C2=A0
192.168.81.2 =C2= =A0 =C2=A0datacenter1 rack1 =C2=A0 =C2=A0 =C2=A0 Up =C2=A0 =C2=A0 Normal = =C2=A07.27 GB =C2=A0 =C2=A0 =C2=A0 =C2=A0 25.00% =C2=A00 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0
192.168.81.3 =C2=A0 =C2=A0datacenter1 rack1 =C2=A0 =C2=A0 =C2=A0 Up = =C2=A0 =C2=A0 Normal =C2=A07.74 GB =C2=A0 =C2=A0 =C2=A0 =C2=A0 25.00% =C2= =A042535295865117307932921825928971026432 =C2=A0 =C2=A0 =C2=A0
19= 2.168.81.4 =C2=A0 =C2=A0datacenter1 rack1 =C2=A0 =C2=A0 =C2=A0 Up =C2=A0 = =C2=A0 Normal =C2=A07.38 GB =C2=A0 =C2=A0 =C2=A0 =C2=A0 25.00% =C2=A0850705= 91730234615865843651857942052864 =C2=A0 =C2=A0 =C2=A0
192.168.81.5 =C2=A0 =C2=A0datacenter1 rack1 =C2=A0 =C2=A0 =C2=A0 Up = =C2=A0 =C2=A0 Normal =C2=A07.32 GB =C2=A0 =C2=A0 =C2=A0 =C2=A0 25.00% =C2= =A0127605887595351923798765477786913079296 =C2=A0 =C2=A0=C2=A0
<= div>
Schema for the relevant bits of the keyspace is as follo= ws:

[default@A] show schema;
create keyspace A
=C2=A0 with placement_strategy =3D 'SimpleStrategy'
=C2=A0 and strategy_options =3D [{replication_factor : 3}];
[.= ..]
create column family UserDetails
=C2=A0 with column_type =3D '= ;Standard'
=C2=A0 and comparator =3D 'IntegerType'
=C2=A0 and default_validation_class =3D 'BytesType'
=C2=A0 and key_validation_class =3D 'BytesType'
=C2=A0 and memtable_operations =3D 0.571875
=C2=A0 and memta= ble_throughput =3D 122
=C2=A0 and memtable_flush_after =3D 1440
=C2=A0 and rows_cached =3D 0.0
=C2=A0 and row_cache_save= _period =3D 0
=C2=A0 and keys_cached =3D 200000.0
=C2=A0 and key_cache_save_period =3D 14400
=C2=A0 and read_r= epair_chance =3D 1.0
=C2=A0 and gc_grace =3D 864000
=C2= =A0 and min_compaction_threshold =3D 4
=C2=A0 and max_compaction_= threshold =3D 32
=C2=A0 and replicate_on_write =3D true
=C2=A0 and row_cache_provider =3D 'ConcurrentLinkedHashCacheProvid= er';

And now the symptoms - output of 'nod= etool -h localhost cfstats' on each node. =C2=A0Please note the figures= on node1.

node1
Column Family: UserDetails
=
SSTable count: 0
Space used (live): 0
Space used (= total): 0
Number of Keys (estimate): 0
Memtable Columns= Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Rea= d Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache ca= pacity: 200000
Key cache size: 0
Key cache hit rate: NaN
Row cach= e: disabled
Compacted row minimum size: 0
Compacted row= maximum size: 0
Compacted row mean size: 0

node2
Column Family: UserDetails
SSTable co= unt: 3
Space used (live): 112952788
Space used (total):= 164953743
Number of Keys (estimate): 384
Memtable Colu= mns Count: 159419
Memtable Data Size: 74910890
Memtable Switch Count: 59
=
Read Count: 135307426
Read Latency: 25.900 ms.
Wri= te Count: 3474673
Write Latency: 0.040 ms.
Pending Task= s: 0
Key cache capacity: 200000
Key cache size: 120
Key= cache hit rate: 0.999971684189041
Row cache: disabled
= Compacted row minimum size: 42511
Compacted row maximum size: 749= 75550
Compacted row mean size: 42364305

node3
Column Family: UserDetails
SSTable count: 3
Space used (live): 112953137
Space used (total): 112953137
Number of Keys (estimate): 384
Memtable Columns Count: 15942= 1
Memtable Data Size: 74693445
Memtable Switch Count: 5= 6
Read Count: 135304486
Read Latency: 25.552 ms.
Write Count: 3474616
Write Latency: 0.036 ms.
Pend= ing Tasks: 0
Key cache capacity: 200000
Key cache size:= 109
Key cache hit rate: 0.9999716840888175
Row cache: = disabled
Compacted row minimum size: 42511
Compacted row maximum size= : 74975550
Compacted row mean size: 42364305

=
node4
Column Family: UserDetails
SSTable co= unt: 3
Space used (live): 117070926
Space used (total): 119479484
Number of Keys (estimate): 384
Memtable Columns Count: 1= 59979
Memtable Data Size: 75029672
Memtable Switch Coun= t: 60
Read Count: 135294878
Read Latency: 19.455 ms.
Wri= te Count: 3474982
Write Latency: 0.028 ms.
Pending Task= s: 0
Key cache capacity: 200000
Key cache size: 119
Key cache hit rate: 0.9999752235777154
Row cache: disabled
Compacted row minimum size: 2346800
Compacted row maximu= m size: 62479625
Compacted row mean size: 42591803

When I go to 'data' directory on node1 there is no f= iles regarding the UserDetails ColumnFamily.

I tri= ed performing manual repair in hope it will heal the situation, however wit= hout any luck.

# nodetool -h localhost repair A UserDetails
= =C2=A0INFO 15:19:54,611 Starting repair command #8, repairing 3 ranges.
=C2=A0INFO 15:19:54,647 Sending AEService tree for #<TreeRequest = manual-repair-89c1acb0-184c-438f-bab8-7ceed27980ec, /192.168.81.2, (A,UserDetails), (85070591730= 234615865843651857942052864,127605887595351923798765477786913079296]>
=C2=A0INFO 15:19:54,742 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (85070591730= 234615865843651857942052864,127605887595351923798765477786913079296]
=C2=A0INFO 15:19:54,750 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (85070591730= 234615865843651857942052864,127605887595351923798765477786913079296]
=C2=A0INFO 15:19:54,751 Repair session manual-repair-89c1acb0-184c-438= f-bab8-7ceed27980ec (on cfs [Ljava.lang.String;@3491507b, range (8507059173= 0234615865843651857942052864,127605887595351923798765477786913079296]) comp= leted successfully
=C2=A0INFO 15:19:54,816 Sending AEService tree for #<TreeRequest ma= nual-repair-6d2438ca-a05c-4217-92c7-c2ad563a92dd, /192.168.81.2, (A,UserDetails), (4253529586511= 7307932921825928971026432,85070591730234615865843651857942052864]>
=C2=A0INFO 15:19:54,865 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (42535295865= 117307932921825928971026432,85070591730234615865843651857942052864]
=C2=A0INFO 15:19:54,874 Endpoints /192.168.81.2 and /192.168.81.5 are consistent for UserDetails on (42535295865= 117307932921825928971026432,85070591730234615865843651857942052864]
=C2=A0INFO 15:19:54,874 Repair session manual-repair-6d2438ca-a05c-421= 7-92c7-c2ad563a92dd (on cfs [Ljava.lang.String;@7e541d08, range (4253529586= 5117307932921825928971026432,85070591730234615865843651857942052864]) compl= eted successfully
=C2=A0INFO 15:19:54,909 Sending AEService tree for #<TreeRequest ma= nual-repair-98d1a21c-9d6e-41c8-8917-aea70f716243, /192.168.81.2, (A,UserDetails), (1276058875953= 51923798765477786913079296,0]>
=C2=A0INFO 15:19:54,967 Endpoints /192.168.81.2 and /192.168.81.3 are consistent for UserDetails on (12760588759= 5351923798765477786913079296,0]
=C2=A0INFO 15:19:54,974 Endpoints /192.168.81.2 and /192.168.81.4 are consistent for UserDetails on (12760588759535= 1923798765477786913079296,0]
=C2=A0INFO 15:19:54,975 Repair session manual-repair-98d1a21c-9d6e-41c= 8-8917-aea70f716243 (on cfs [Ljava.lang.String;@48c651f2, range (1276058875= 95351923798765477786913079296,0]) completed successfully
=C2=A0INFO 15:19:54,975 Repair command #8 completed successfully
=

As I am using SimpleStrategy I would expect the keys to= be split, more or less, equally across the nodes, however this don't s= eem to be the case.

Has anyone came across similar behaviour before? =C2=A0= Has anyone have any suggestions what I could do to bring some data into nod= e1? =C2=A0Obviously, this kind of data split means node2, node3 and node4 n= eed to do all the read work which is not ideal.

Any suggestions much appreciated.

<= div>Kind regards,
Bart


------=_Part_0_1224722352.1323867730296--