Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 209.85.215.43 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAECxnKQrmETR5uW=CZnnY1_P5TVcdt4mdEENEBS1XPkFKzgQwA@mail.gmail.com>
References: 
 <CAECxnKTUV8mSp0jF+fCiM+=axSZjioaRhuLizKiqVBC1E6LqpQ@mail.gmail.com>
 <CAKkz8Q1SdgBAxxME0NKQmXCfi=Qekbu8Te+daCfsqO=wFuR6ew@mail.gmail.com>
 <CAECxnKRDd6AA6w_Yjxgsexr137xLmuBrDKdOjedP2oF0VWsKxg@mail.gmail.com>
 <CALdd-zg+kMut_K0wOVSMUGRSBRkimY0--Jyerad-kexTP_zvvg@mail.gmail.com>
 <CAECxnKQrmETR5uW=CZnnY1_P5TVcdt4mdEENEBS1XPkFKzgQwA@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Thu, 15 Sep 2011 09:00:17 -0500
Message-ID: 
 <CALdd-zhhKd52q0AwBjmZ0rZU7thZ4=55Gvc1awNL+c9v6GM5WA@mail.gmail.com>
Subject: Re: New node unable to stream (0.8.5)
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hinted handoff doesn't use streaming mode, so it doesn't care.

("Streaming" to Cassandra means sending raw sstable file ranges to
another node.  HH just uses the normal column-based write path.)

On Thu, Sep 15, 2011 at 8:24 AM, Ethan Rowe <ethan@the-rowes.com> wrote:
> Thanks, Jonathan. =A0I'll try the workaround and see if that gets the str=
eams
> flowing properly.
> As I mentioned before, we did not run scrub yet. =A0What is the consequen=
ce of
> letting the streams from the hinted handoffs complete if scrub hasn't bee=
n
> run on these nodes?
> I'm currently running scrub on one node to get a sense of the time frame.
> Thanks again.
> - Ethan
>
> On Thu, Sep 15, 2011 at 9:09 AM, Jonathan Ellis <jbellis@gmail.com> wrote=
:
>>
>> That means we missed a place we needed to special-case for backwards
>> compatibility -- the workaround is, add an empty encryption_options
>> section
>> to cassandra.yaml:
>>
>> encryption_options:
>> =A0 =A0internode_encryption: none
>> =A0 =A0keystore: conf/.keystore
>> =A0 =A0keystore_password: cassandra
>> =A0 =A0truststore: conf/.truststore
>> =A0 =A0truststore_password: cassandra
>>
>> Created https://issues.apache.org/jira/browse/CASSANDRA-3212 to fix this=
.
>>
>> On Thu, Sep 15, 2011 at 7:13 AM, Ethan Rowe <ethan@the-rowes.com> wrote:
>> > Here's a typical log slice (not terribly informative, I fear):
>> >>
>> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,106
>> >> AntiEntropyService.java
>> >> (l
>> >> ine 884) Performing streaming repair of 1003 ranges with /10.34.90.8
>> >> for
>> >> (299
>> >>
>> >>
>> >> 90798416657667504332586989223299634,542966817681532720374307732343496=
00451]
>> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,427 StreamOut.java (=
line
>> >> 181)
>> >> Stream context metadata
>> >> [/mnt/cassandra/data/events_production/FitsByShip-g-1
>> >> 0-Data.db sections=3D88 progress=3D0/11707163 - 0%,
>> >> /mnt/cassandra/data/events_pr
>> >> oduction/FitsByShip-g-11-Data.db sections=3D169 progress=3D0/6133240 =
- 0%,
>> >> /mnt/c
>> >> assandra/data/events_production/FitsByShip-g-6-Data.db sections=3D1
>> >> progress=3D0/
>> >> 6918814 - 0%,
>> >> /mnt/cassandra/data/events_production/FitsByShip-g-12-Data.db s
>> >> ections=3D260 progress=3D0/9091780 - 0%], 4 sstables.
>> >> =A0INFO [AntiEntropyStage:2] 2011-09-15 05:41:36,428
>> >> StreamOutSession.java
>> >> (lin
>> >> e 174) Streaming to /10.34.90.8
>> >> ERROR [Thread-56] 2011-09-15 05:41:38,515 AbstractCassandraDaemon.jav=
a
>> >> (line
>> >> 139) Fatal exception in thread Thread[Thread-56,5,main]
>> >> java.lang.NullPointerException
>> >> =A0 =A0 =A0 =A0 at
>> >> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpC
>> >> onnection.java:174)
>> >> =A0 =A0 =A0 =A0 at
>> >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConn
>> >> ection.java:114)
>> >
>> > Not sure if the exception is related to the outbound streaming above;
>> > other
>> > nodes are actively trying to stream to this node, so perhaps it comes
>> > from
>> > those and temporal adjacency to the outbound stream is just
>> > coincidental. =A0I
>> > have other snippets that look basically identical to the above, except
>> > if I
>> > look at the logs to which this node is trying to stream, I see that it
>> > has
>> > concurrently opened a stream in the other direction, which could be th=
e
>> > one
>> > that the exception pertains to.
>> >
>> > On Thu, Sep 15, 2011 at 7:41 AM, Sylvain Lebresne <sylvain@datastax.co=
m>
>> > wrote:
>> >>
>> >> On Thu, Sep 15, 2011 at 1:16 PM, Ethan Rowe <ethan@the-rowes.com>
>> >> wrote:
>> >> > Hi.
>> >> >
>> >> > We've been running a 7-node cluster with RF 3, QUORUM reads/writes =
in
>> >> > our
>> >> > production environment for a few months. =A0It's been consistently
>> >> > stable
>> >> > during this period, particularly once we got out maintenance strate=
gy
>> >> > fully
>> >> > worked out (per node, one repair a week, one major compaction a wee=
k,
>> >> > the
>> >> > latter due to the nature of our data model and usage). =A0While thi=
s
>> >> > cluster
>> >> > started, back in June or so, on the 0.7 series, it's been running
>> >> > 0.8.3
>> >> > for
>> >> > a while now with no issues. =A0We upgraded to 0.8.5 two days ago,
>> >> > having
>> >> > tested the upgrade in our staging cluster (with an otherwise
>> >> > identical
>> >> > configuration) previously and verified that our application's vario=
us
>> >> > use
>> >> > cases appeared successful.
>> >> >
>> >> > One of our nodes suffered a disk failure yesterday. =A0We attempted=
 to
>> >> > replace
>> >> > the dead node by placing a new node at OldNode.initial_token - 1 wi=
th
>> >> > auto_bootstrap on. =A0A few things went awry from there:
>> >> >
>> >> > 1. We never saw the new node in bootstrap mode; it became available
>> >> > pretty
>> >> > much immediately upon joining the ring, and never reported a
>> >> > "joining"
>> >> > state. =A0I did verify that auto_bootstrap was on.
>> >> >
>> >> > 2. I mistakenly ran repair on the new node rather than removetoken =
on
>> >> > the
>> >> > old node, due to a delightful mental error. =A0The repair got nowhe=
re
>> >> > fast, as
>> >> > it attempts to repair against the down node which throws an
>> >> > exception.
>> >> > =A0So I
>> >> > interrupted the repair, restarted the node to clear any pending
>> >> > validation
>> >> > compactions, and...
>> >> >
>> >> > 3. Ran removetoken for the old node.
>> >> >
>> >> > 4. We let this run for some time and saw eventually that all the
>> >> > nodes
>> >> > appeared to be done various compactions and were stuck at streaming=
.
>> >> > Many
>> >> > streams listed as open, none making any progress.
>> >> >
>> >> > 5.=A0 I observed an Rpc-related exception on the new node (where th=
e
>> >> > removetoken was launched) and concluded that the streams were broke=
n
>> >> > so
>> >> > the
>> >> > process wouldn't ever finish.
>> >> >
>> >> > 6. Ran a "removetoken force" to get the dead node out of the mix.=
=A0 No
>> >> > problems.
>> >> >
>> >> > 7. Ran a repair on the new node.
>> >> >
>> >> > 8. Validations ran, streams opened up, and again things got stuck i=
n
>> >> > streaming, hanging for over an hour with no progress.
>> >> >
>> >> > 9. Musing that lingering tasks from the removetoken could be a
>> >> > factor, I
>> >> > performed a rolling restart and attempted a repair again.
>> >> >
>> >> > 10. Same problem.=A0 Did another rolling restart and attempted a fr=
esh
>> >> > repair
>> >> > on the most important column family alone.
>> >> >
>> >> > 11. Same problem.=A0 Streams included CFs not specified, so I guess
>> >> > they
>> >> > must
>> >> > be for hinted handoff.
>> >> >
>> >> > In concluding that streaming is stuck, I've observed:
>> >> > - streams will be open to the new node from other nodes, but the ne=
w
>> >> > node
>> >> > doesn't list them
>> >> > - streams will be open to the other nodes from the new node, but th=
e
>> >> > other
>> >> > nodes don't list them
>> >> > - the streams reported may make some initial progress, but then the=
y
>> >> > hang at
>> >> > a particular point and do not move on for an hour or more.
>> >> > - The logs report repair-related activity, until NPEs on incoming T=
CP
>> >> > connections show up, which appear likely to be the culprit.
>> >>
>> >> Can you send the stack trace from those NPE.
>> >>
>> >> >
>> >> > I can provide more exact details when I'm done commuting.
>> >> >
>> >> > With streaming broken on this node, I'm unable to run repairs, whic=
h
>> >> > is
>> >> > obviously problematic.=A0 The application didn't suffer any operati=
onal
>> >> > issues
>> >> > as a consequence of this, but I need to review the overnight result=
s
>> >> > to
>> >> > verify we're not suffering data loss (I doubt we are).
>> >> >
>> >> > At this point, I'm considering a couple options:
>> >> > 1. Remove the new node and let the adjacent node take over its rang=
e
>> >> > 2. Bring the new node down, add a new one in front of it, and
>> >> > properly
>> >> > removetoken the problematic one.
>> >> > 3. Bring the new node down, remove all its data except for the syst=
em
>> >> > keyspace, then bring it back up and repair it.
>> >> > 4. Revert to 0.8.3 and see if that helps.
>> >> >
>> >> > Recommendations?
>> >> >
>> >> > Thanks.
>> >> > - Ethan
>> >> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>


--=20
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com