Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E49A68B82 for ; Thu, 15 Sep 2011 14:01:07 +0000 (UTC) Received: (qmail 82570 invoked by uid 500); 15 Sep 2011 14:01:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82551 invoked by uid 500); 15 Sep 2011 14:01:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82543 invoked by uid 99); 15 Sep 2011 14:01:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 14:01:04 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.215.43 as permitted sender) Received: from [209.85.215.43] (HELO mail-ew0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 14:00:59 +0000 Received: by ewy20 with SMTP id 20so1758286ewy.30 for ; Thu, 15 Sep 2011 07:00:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=7PSliZqa7RuxtBeLnACaWdrcvGbO7WF27PMo+lAosGs=; b=Sl857RjrcjhTMK8dCoqBEaooDmu6LCi2QJNrkvULdmw1EyOmKtb60TifORO8Zht36y kSSjwLqpRv3NBSNSqwikByOxGuSxCFYFxRHLEaA0KfflTplRMnt1dK1V3O46FVw3P8v4 iCQwRd020MV9I2H5pFJIf7RI8naTEfcno2uK0= Received: by 10.213.16.134 with SMTP id o6mr486203eba.25.1316095237286; Thu, 15 Sep 2011 07:00:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.16.193 with HTTP; Thu, 15 Sep 2011 07:00:17 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Thu, 15 Sep 2011 09:00:17 -0500 Message-ID: Subject: Re: New node unable to stream (0.8.5) To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hinted handoff doesn't use streaming mode, so it doesn't care. ("Streaming" to Cassandra means sending raw sstable file ranges to another node. HH just uses the normal column-based write path.) On Thu, Sep 15, 2011 at 8:24 AM, Ethan Rowe wrote: > Thanks, Jonathan. =A0I'll try the workaround and see if that gets the str= eams > flowing properly. > As I mentioned before, we did not run scrub yet. =A0What is the consequen= ce of > letting the streams from the hinted handoffs complete if scrub hasn't bee= n > run on these nodes? > I'm currently running scrub on one node to get a sense of the time frame. > Thanks again. > - Ethan > > On Thu, Sep 15, 2011 at 9:09 AM, Jonathan Ellis wrote= : >> >> That means we missed a place we needed to special-case for backwards >> compatibility -- the workaround is, add an empty encryption_options >> section >> to cassandra.yaml: >> >> encryption_options: >> =A0 =A0internode_encryption: none >> =A0 =A0keystore: conf/.keystore >> =A0 =A0keystore_password: cassandra >> =A0 =A0truststore: conf/.truststore >> =A0 =A0truststore_password: cassandra >> >> Created https://issues.apache.org/jira/browse/CASSANDRA-3212 to fix this= . >> >> On Thu, Sep 15, 2011 at 7:13 AM, Ethan Rowe wrote: >> > Here's a typical log slice (not terribly informative, I fear): >> >> >> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106 >> >> AntiEntropyService.java >> >> (l >> >> ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8 >> >> for >> >> (299 >> >> >> >> >> >> 90798416657667504332586989223299634,542966817681532720374307732343496= 00451] >> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (= line >> >> 181) >> >> Stream context metadata >> >> [/mnt/cassandra/data/events_production/FitsByShip-g-1 >> >> 0-Data.db sections=3D88 progress=3D0/11707163 - 0%, >> >> /mnt/cassandra/data/events_pr >> >> oduction/FitsByShip-g-11-Data.db sections=3D169 progress=3D0/6133240 = - 0%, >> >> /mnt/c >> >> assandra/data/events_production/FitsByShip-g-6-Data.db sections=3D1 >> >> progress=3D0/ >> >> 6918814 - 0%, >> >> /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s >> >> ections=3D260 progress=3D0/9091780 - 0%], 4 sstables. >> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428 >> >> StreamOutSession.java >> >> (lin >> >> e 174) Streaming to /10.34.90.8 >> >> ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.jav= a >> >> (line >> >> 139) Fatal exception in thread Thread[Thread-56,5,main] >> >> java.lang.NullPointerException >> >> =A0 =A0 =A0 =A0 at >> >> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC >> >> onnection.java:174) >> >> =A0 =A0 =A0 =A0 at >> >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn >> >> ection.java:114) >> > >> > Not sure if the exception is related to the outbound streaming above; >> > other >> > nodes are actively trying to stream to this node, so perhaps it comes >> > from >> > those and temporal adjacency to the outbound stream is just >> > coincidental. =A0I >> > have other snippets that look basically identical to the above, except >> > if I >> > look at the logs to which this node is trying to stream, I see that it >> > has >> > concurrently opened a stream in the other direction, which could be th= e >> > one >> > that the exception pertains to. >> > >> > On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne >> > wrote: >> >> >> >> On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe >> >> wrote: >> >> > Hi. >> >> > >> >> > We've been running a 7-node cluster with RF 3, QUORUM reads/writes = in >> >> > our >> >> > production environment for a few months. =A0It's been consistently >> >> > stable >> >> > during this period, particularly once we got out maintenance strate= gy >> >> > fully >> >> > worked out (per node, one repair a week, one major compaction a wee= k, >> >> > the >> >> > latter due to the nature of our data model and usage). =A0While thi= s >> >> > cluster >> >> > started, back in June or so, on the 0.7 series, it's been running >> >> > 0.8.3 >> >> > for >> >> > a while now with no issues. =A0We upgraded to 0.8.5 two days ago, >> >> > having >> >> > tested the upgrade in our staging cluster (with an otherwise >> >> > identical >> >> > configuration) previously and verified that our application's vario= us >> >> > use >> >> > cases appeared successful. >> >> > >> >> > One of our nodes suffered a disk failure yesterday. =A0We attempted= to >> >> > replace >> >> > the dead node by placing a new node at OldNode.initial_token - 1 wi= th >> >> > auto_bootstrap on. =A0A few things went awry from there: >> >> > >> >> > 1. We never saw the new node in bootstrap mode; it became available >> >> > pretty >> >> > much immediately upon joining the ring, and never reported a >> >> > "joining" >> >> > state. =A0I did verify that auto_bootstrap was on. >> >> > >> >> > 2. I mistakenly ran repair on the new node rather than removetoken = on >> >> > the >> >> > old node, due to a delightful mental error. =A0The repair got nowhe= re >> >> > fast, as >> >> > it attempts to repair against the down node which throws an >> >> > exception. >> >> > =A0So I >> >> > interrupted the repair, restarted the node to clear any pending >> >> > validation >> >> > compactions, and... >> >> > >> >> > 3. Ran removetoken for the old node. >> >> > >> >> > 4. We let this run for some time and saw eventually that all the >> >> > nodes >> >> > appeared to be done various compactions and were stuck at streaming= . >> >> > Many >> >> > streams listed as open, none making any progress. >> >> > >> >> > 5.=A0 I observed an Rpc-related exception on the new node (where th= e >> >> > removetoken was launched) and concluded that the streams were broke= n >> >> > so >> >> > the >> >> > process wouldn't ever finish. >> >> > >> >> > 6. Ran a "removetoken force" to get the dead node out of the mix.= =A0 No >> >> > problems. >> >> > >> >> > 7. Ran a repair on the new node. >> >> > >> >> > 8. Validations ran, streams opened up, and again things got stuck i= n >> >> > streaming, hanging for over an hour with no progress. >> >> > >> >> > 9. Musing that lingering tasks from the removetoken could be a >> >> > factor, I >> >> > performed a rolling restart and attempted a repair again. >> >> > >> >> > 10. Same problem.=A0 Did another rolling restart and attempted a fr= esh >> >> > repair >> >> > on the most important column family alone. >> >> > >> >> > 11. Same problem.=A0 Streams included CFs not specified, so I guess >> >> > they >> >> > must >> >> > be for hinted handoff. >> >> > >> >> > In concluding that streaming is stuck, I've observed: >> >> > - streams will be open to the new node from other nodes, but the ne= w >> >> > node >> >> > doesn't list them >> >> > - streams will be open to the other nodes from the new node, but th= e >> >> > other >> >> > nodes don't list them >> >> > - the streams reported may make some initial progress, but then the= y >> >> > hang at >> >> > a particular point and do not move on for an hour or more. >> >> > - The logs report repair-related activity, until NPEs on incoming T= CP >> >> > connections show up, which appear likely to be the culprit. >> >> >> >> Can you send the stack trace from those NPE. >> >> >> >> > >> >> > I can provide more exact details when I'm done commuting. >> >> > >> >> > With streaming broken on this node, I'm unable to run repairs, whic= h >> >> > is >> >> > obviously problematic.=A0 The application didn't suffer any operati= onal >> >> > issues >> >> > as a consequence of this, but I need to review the overnight result= s >> >> > to >> >> > verify we're not suffering data loss (I doubt we are). >> >> > >> >> > At this point, I'm considering a couple options: >> >> > 1. Remove the new node and let the adjacent node take over its rang= e >> >> > 2. Bring the new node down, add a new one in front of it, and >> >> > properly >> >> > removetoken the problematic one. >> >> > 3. Bring the new node down, remove all its data except for the syst= em >> >> > keyspace, then bring it back up and repair it. >> >> > 4. Revert to 0.8.3 and see if that helps. >> >> > >> >> > Recommendations? >> >> > >> >> > Thanks. >> >> > - Ethan >> >> > >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com > > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com