From cassandra-user-return-917-apmail-incubator-cassandra-user-archive=incubator.apache.org@incubator.apache.org Thu Oct 15 11:40:37 2009 Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 74904 invoked from network); 15 Oct 2009 11:40:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Oct 2009 11:40:37 -0000 Received: (qmail 12553 invoked by uid 500); 15 Oct 2009 11:40:37 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 12525 invoked by uid 500); 15 Oct 2009 11:40:36 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 12516 invoked by uid 99); 15 Oct 2009 11:40:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 11:40:36 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [212.192.243.50] (HELO hook.sigaev.ru) (212.192.243.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Oct 2009 11:40:33 +0000 Received: by hook.sigaev.ru (Postfix, from userid 1003) id 7CF01398F6; Thu, 15 Oct 2009 15:40:10 +0400 (MSD) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on hook.sigaev.ru X-Spam-Level: Received: from xor.sai.msu.ru (unknown [195.218.191.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: teodor) by hook.sigaev.ru (Postfix) with ESMTPSA id 7D1AF398F2 for ; Thu, 15 Oct 2009 15:39:56 +0400 (MSD) Message-ID: <4AD70A0B.8010202@sigaev.ru> Date: Thu, 15 Oct 2009 15:39:55 +0400 From: Teodor Sigaev User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.23) Gecko/20090909 SeaMonkey/1.1.18 MIME-Version: 1.0 To: cassandra-user@incubator.apache.org Subject: Re: Cassandra restart References: <4AD5CF54.5080000@sigaev.ru> <4AD5D53B.3090909@sigaev.ru> <4AD5DB2C.9070309@sigaev.ru> <4AD5DF67.4070707@sigaev.ru> In-Reply-To: <4AD5DF67.4070707@sigaev.ru> Content-Type: multipart/mixed; boundary="------------010306010800020903030005" X-Old-Spam-Status: No, score=-104.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, USER_IN_WHITELIST autolearn=ham version=3.2.5 This is a multi-part message in MIME format. --------------010306010800020903030005 Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit > I will try to reproduce problem on smaller test cluster. It was rather easy, cluster contains 4 servers. Log's fragment from restarted node (10.2.3.38): DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 CassandraServer.java (line 214) get_slice DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 StorageProxy.java (line 239) weakreadlocal reading SliceFromReadCommand(table='Keyspace1', key='0000000000000000000000000000000000849706', column_parent='QueryPath(columnFamilyName='Super1', superColumnName='[B@6ca50fbe', columnName='null')', start='1', finish='0', reversed=true, count=2) DEBUG [pool-1-thread-64] 2009-10-15 14:18:16,290 StorageProxy.java (line 251) weakreadremote reading SliceFromReadCommand(table='Keyspace1', key='0000000000000000000000000000000000849706', column_parent='QueryPath(columnFamilyName='Super1', superColumnName='[B@6ca50fbe', columnName='null')', start='1', finish='0', reversed=true, count=2) from 207911@10.3.2.40:7000 ... ERROR [pool-1-thread-64] 2009-10-15 14:18:21,281 Cassandra.java (line 679) Internal error processing get_slice java.lang.RuntimeException: error reading key 0000000000000000000000000000000000849706 at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:265) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:312) at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:95) at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:177) at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:252) at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:215) at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.util.concurrent.TimeoutException: Operation timed out. at org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97) at org.apache.cassandra.service.StorageProxy.weakReadRemote(StorageProxy.java:261) ... 11 more Log's fragment from 10.2.3.40: DEBUG [ROW-READ-STAGE:4] 2009-10-15 14:18:16,308 ReadVerbHandler.java (line 100) Read key 0000000000000000000000000000000000849706; sending response to 207911@10.3.2.38:7000 .... DEBUG [CONSISTENCY-MANAGER:2] 2009-10-15 14:18:16,308 ConsistencyManager.java (line 168) Reading consistency digest for 0000000000000000000000000000000000849706 from 527021@[10.3.2.39:7000, 10.3.2.41:7000] I have full logs, but they are about half of gigabyte for each node. If it's needed I can put them somewhere accessible by http. How to reproduce: - configure cluster for 4 nodes, changes in storage-conf.xml: 3 8 16 - edit attached scripts with correct node's IPs - run perl writecluster.pl -c 8 and wait for 10-20 minutes - run perl readcluster.pl - look at error :) -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ --------------010306010800020903030005 Content-Type: application/x-perl; name="writecluster.pl" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="writecluster.pl" #!/usr/bin/perl use lib qw(/usr/local/cassandra/interface/gen-perl /usr/local/cassandra/interface/gen-perl/Cassandra); use strict; use Carp; $SIG{ __DIE__ } = sub { Carp::confess( @_ ) }; use Cassandra; use Thrift::Socket; use Thrift::BinaryProtocol; use Thrift::BufferedTransport; use Data::Dumper; use Time::HiRes qw( gettimeofday tv_interval ); use Getopt::Std; my %opt; getopts('c:', \%opt); $opt{c} ||= 1; my @servers = ( '10.3.2.38', '10.3.2.39', '10.3.2.40', '10.3.2.41', ); my $t0 = [gettimeofday]; if ( $opt{c} <= 1) { ProcessTest($servers[0], 0); } else { for(my $i=0; $i<$opt{c};$i++) { my $server = $servers[ $i % (1+$#servers) ]; if ( fork()==0 ) { sleep(1); #Wait to fork all childs ProcessTest($server, $i); exit(0); } } my $status = 0; my $pid; while( ($pid=wait())>0 ) { $status ||= $?>>8; } print "All flow finish; status: $status\n"; } my $elapsed = tv_interval ( $t0, [gettimeofday]); print sprintf("ELAPSED: %.05f sec;", $elapsed); sub _makeColumnList($) { my ($row) = @_; my @cfmap; foreach my $k (keys %$row) { push @cfmap, Cassandra::ColumnOrSuperColumn->new({ column=>Cassandra::Column->new({ name=>$k, value=>$row->{$k}, timestamp=>time(), }) }); } die if $#cfmap < 0; return \@cfmap; } sub ProcessTest($$) { my ($server, $ith) = @_; my $transport; my $client; my $i=0; for(;;) { eval { if (!defined $client) { my $socket = Thrift::Socket->new($server, 9160); $socket->setSendTimeout(100000); $socket->setRecvTimeout(500000); $transport = Thrift::BufferedTransport->new($socket, 1024, 1024); my $protocol = Thrift::BinaryProtocol->new($transport); $client = Cassandra::CassandraClient->new($protocol); $transport->open(); } my $PID = sprintf("%040lld", int(1000000 * rand())); #$client->batch_insert( # 'Keyspace1', # $PID, # { # 'Standard1' => _makeColumnList ({ # map { # $_=>'0'x(int(1 + 100 * rand())) # } (0..int(1+10*rand())) # }) # }, # Cassandra::ConsistencyLevel::ONE #); my $PPID = sprintf("%040lld", int(1000000 * rand())); $client->insert( 'Keyspace1', $PID, Cassandra::ColumnPath->new({ column_family=>'Super1', super_column=>'x', column=>$PPID, }), rand(), time(), Cassandra::ConsistencyLevel::ONE ); $i++; print "$i records by $ith child\n" if ($i % 1000 == 0); }; if ($@) { print Dumper($@); $transport->close(); undef $client; } } } --------------010306010800020903030005 Content-Type: application/x-perl; name="readcluster.pl" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="readcluster.pl" #!/usr/bin/perl use lib qw(/usr/local/cassandra/interface/gen-perl /usr/local/cassandra/interface/gen-perl/Cassandra); use strict; use Carp; $SIG{ __DIE__ } = sub { Carp::confess( @_ ) }; use Cassandra; use Thrift::Socket; use Thrift::BinaryProtocol; use Thrift::BufferedTransport; use Data::Dumper; use Time::HiRes qw( gettimeofday tv_interval ); use Getopt::Std; my %opt; getopts('c:', \%opt); $opt{c} ||= 1; my @servers = ( '10.3.2.38', '10.3.2.39', '10.3.2.40', '10.3.2.41', ); my $t0 = [gettimeofday]; if ( $opt{c} <= 1) { ProcessTest($servers[0], 0); } else { for(my $i=0; $i<$opt{c};$i++) { my $server = $servers[ $i % (1+$#servers) ]; if ( fork()==0 ) { sleep(1); #Wait to fork all childs ProcessTest($server, $i); exit(0); } } my $status = 0; my $pid; while( ($pid=wait())>0 ) { $status ||= $?>>8; } print "All flow finish; status: $status\n"; } my $elapsed = tv_interval ( $t0, [gettimeofday]); print sprintf("ELAPSED: %.05f sec;", $elapsed); sub ProcessTest($$) { my ($server, $ith) = @_; my $transport; my $client; my $i=0; my $PID; for(;;) { $PID = ''; eval { if (!defined $client) { my $socket = Thrift::Socket->new($server, 9160); $socket->setSendTimeout(1000); $socket->setRecvTimeout(5000); $transport = Thrift::BufferedTransport->new($socket, 1024, 1024); my $protocol = Thrift::BinaryProtocol->new($transport); $client = Cassandra::CassandraClient->new($protocol); $transport->open(); } $PID = sprintf("%040lld", int(1000000 * rand())); my $cols = $client->get_slice( 'Keyspace1', $PID, Cassandra::ColumnParent->new({ column_family=>'Super1', super_column=>'x' }), Cassandra::SlicePredicate->new({ #column_names=>['0000000000000000000001'], slice_range=>Cassandra::SliceRange->new({ start=> '1', finish=> '0', reversed=>1, count=>2 }) }), Cassandra::ConsistencyLevel::ONE ); #print "KEY $PID ; ".Dumper($cols) if (ref($cols) && $#$cols >= 0); $i++; print "$i records by $ith child\n" if ($i % 1000 == 0); }; if ($@) { print "KEY $PID ; ".Dumper($@); $transport->close(); undef $client; } } } --------------010306010800020903030005--