Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <BANLkTimMBAcjsYrgN95CWr3LPHBCoKakZg@mail.gmail.com>
References: <BANLkTikK_HLwFRZHZWn5gBxXrdhMichPbg@mail.gmail.com>
	<BANLkTinapLhp7oY8Fe05Wgy-oQVLr_ogJw@mail.gmail.com>
	<BANLkTi=9DMC8Yby_67JYs5kR2BHzjLrQVA@mail.gmail.com>
	<BANLkTimMBAcjsYrgN95CWr3LPHBCoKakZg@mail.gmail.com>
Date: Fri, 24 Jun 2011 09:50:26 -0500
Message-ID: <BANLkTi=SJn4n0EKz+mP3wLQOGL_i89P8YA@mail.gmail.com>
Subject: Re: Restarting cluster
From: David McNelis <dmcnelis@agentisenergy.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00504502d7577531c404a67652bf

--00504502d7577531c404a67652bf
Content-Type: text/plain; charset=ISO-8859-1

It was port 7000 that was my issue.  I was thinking everything was going off
9160, and hadn't made sure that port was open.

Thanks Sasha and Jonathan.

On Fri, Jun 24, 2011 at 8:42 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> Did you try netcat to verify that you can get to the internal port on
> machine X from machine Y?
>
> On Fri, Jun 24, 2011 at 8:20 AM, David McNelis
> <dmcnelis@agentisenergy.com> wrote:
> > Running on Centos.
> > We had a massive power failure and our UPS wasn't up to 48 hours without
> > power...
> > In this situation the IP addresses have all stayed the same.  I can still
> > connect to the "other" node from cli, so I don't think its an issue where
> > the iptables settings weren't saved and started blocking traffic.
> > In terms of the log files, the only related line from the log files is
> > saying:
> >  INFO [main] 2011-06-24 07:48:44,750 StorageService.java (line 382)
> Loading
> > persisted ring state
> >  INFO [main] 2011-06-24 07:48:44,757 StorageService.java (line 418)
> Starting
> > up server gossip
> > When I turn on debugging and restart the non-seed node I get this line:
> > DEBUG [WRITE-/192.168.80.XXX] 2011-06-24 08:04:48,798
> > OutboundTcpConnection.java (line 161) attempting to connect to
> > /192.168.80.XXX
> > But no errors after it.
> >
> > On Fri, Jun 24, 2011 at 7:58 AM, Sasha Dolgy <sdolgy@gmail.com> wrote:
> >>
> >> Normally, no.  What you've done is fine.  What is the environment?
> >>
> >> On amazon EC2 for example, the instance could have crashed, a new one
> >> is brought online and has a different internal IP ...
> >>
> >> in the cassandra/logs/system.log are there any messages on the 2nd
> >> node and how it relates to the seed node?
> >>
> >> On Fri, Jun 24, 2011 at 2:49 PM, David McNelis
> >> <dmcnelis@agentisenergy.com> wrote:
> >> > I am running 0.8.0 on CentOS.  I have a 2 nodes in my cluster, one is
> a
> >> > seed, the other is autobootstrapped.
> >> > After having an unexpected shutdown of both of the physical machines I
> >> > am
> >> > trying to restart the cluster.  I first started the seed node, it went
> >> > through the normal startup process and finished without error.  Once
> >> > that
> >> > was complete I started the second node, again no errors in the log as
> it
> >> > was
> >> > starting, it started the gossip server, ect.
> >> > However when I look at the ring using nodetool, both machines  show
> >> > their
> >> > own status as up, then show the other machine as Down with a state of
> >> > Normal
> >> > and a load of ?.  I have tried restarting the individual nodes in
> >> > different
> >> > orders, waiting a while after restarting a node, but still the 'other'
> >> > node
> >> > always has a status of "down".  nodetool repair [keyspace] did not
> make
> >> > any
> >> > difference either and nodetool join just told me that the nodes were
> >> > already
> >> > a part of the ring.
> >> > I can't imagine this is how it *should* be behaving... is there a
> piece
> >> > I'm
> >> > missing in terms of getting one node to recognize the other as being
> Up?
> >
> >
> >
> > --
> > David McNelis
> > Lead Software Engineer
> > Agentis Energy
> > www.agentisenergy.com
> > o: 630.359.6395
> > c: 219.384.5143
> > A Smart Grid technology company focused on helping consumers of energy
> > control an often under-managed resource.
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*

--00504502d7577531c404a67652bf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It was port 7000 that was my issue. =A0I was thinking everything was going =
off 9160, and hadn&#39;t made sure that port was open.<div><br></div><div>T=
hanks Sasha and Jonathan.<br><br><div class=3D"gmail_quote">On Fri, Jun 24,=
 2011 at 8:42 AM, Jonathan Ellis <span dir=3D"ltr">&lt;<a href=3D"mailto:jb=
ellis@gmail.com">jbellis@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">Did you try netcat to verify that you can g=
et to the internal port on<br>
machine X from machine Y?<br>
<br>
On Fri, Jun 24, 2011 at 8:20 AM, David McNelis<br>
<div><div></div><div class=3D"h5">&lt;<a href=3D"mailto:dmcnelis@agentisene=
rgy.com">dmcnelis@agentisenergy.com</a>&gt; wrote:<br>
&gt; Running on Centos.<br>
&gt; We had a massive power failure and our UPS wasn&#39;t up to 48 hours w=
ithout<br>
&gt; power...<br>
&gt; In this situation the IP addresses have all stayed the same. =A0I can =
still<br>
&gt; connect to the &quot;other&quot; node from cli, so I don&#39;t think i=
ts an issue where<br>
&gt; the iptables settings weren&#39;t saved and started blocking traffic.<=
br>
&gt; In terms of the log files, the only related line from the log files is=
<br>
&gt; saying:<br>
&gt; =A0INFO [main] 2011-06-24 07:48:44,750 StorageService.java (line 382) =
Loading<br>
&gt; persisted ring state<br>
&gt; =A0INFO [main] 2011-06-24 07:48:44,757 StorageService.java (line 418) =
Starting<br>
&gt; up server gossip<br>
&gt; When I turn on debugging and restart the non-seed node I get this line=
:<br>
&gt; DEBUG [WRITE-/192.168.80.XXX] 2011-06-24 08:04:48,798<br>
&gt; OutboundTcpConnection.java (line 161) attempting to connect to<br>
&gt; /192.168.80.XXX<br>
&gt; But no errors after it.<br>
&gt;<br>
&gt; On Fri, Jun 24, 2011 at 7:58 AM, Sasha Dolgy &lt;<a href=3D"mailto:sdo=
lgy@gmail.com">sdolgy@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Normally, no. =A0What you&#39;ve done is fine. =A0What is the envi=
ronment?<br>
&gt;&gt;<br>
&gt;&gt; On amazon EC2 for example, the instance could have crashed, a new =
one<br>
&gt;&gt; is brought online and has a different internal IP ...<br>
&gt;&gt;<br>
&gt;&gt; in the cassandra/logs/system.log are there any messages on the 2nd=
<br>
&gt;&gt; node and how it relates to the seed node?<br>
&gt;&gt;<br>
&gt;&gt; On Fri, Jun 24, 2011 at 2:49 PM, David McNelis<br>
&gt;&gt; &lt;<a href=3D"mailto:dmcnelis@agentisenergy.com">dmcnelis@agentis=
energy.com</a>&gt; wrote:<br>
&gt;&gt; &gt; I am running 0.8.0 on CentOS. =A0I have a 2 nodes in my clust=
er, one is a<br>
&gt;&gt; &gt; seed, the other is autobootstrapped.<br>
&gt;&gt; &gt; After having an unexpected shutdown of both of the physical m=
achines I<br>
&gt;&gt; &gt; am<br>
&gt;&gt; &gt; trying to restart the cluster. =A0I first started the seed no=
de, it went<br>
&gt;&gt; &gt; through the normal startup process and finished without error=
. =A0Once<br>
&gt;&gt; &gt; that<br>
&gt;&gt; &gt; was complete I started the second node, again no errors in th=
e log as it<br>
&gt;&gt; &gt; was<br>
&gt;&gt; &gt; starting, it started the gossip server, ect.<br>
&gt;&gt; &gt; However when I look at the ring using nodetool, both machines=
 =A0show<br>
&gt;&gt; &gt; their<br>
&gt;&gt; &gt; own status as up, then show the other machine as Down with a =
state of<br>
&gt;&gt; &gt; Normal<br>
&gt;&gt; &gt; and a load of ?. =A0I have tried restarting the individual no=
des in<br>
&gt;&gt; &gt; different<br>
&gt;&gt; &gt; orders, waiting a while after restarting a node, but still th=
e &#39;other&#39;<br>
&gt;&gt; &gt; node<br>
&gt;&gt; &gt; always has a status of &quot;down&quot;. =A0nodetool repair [=
keyspace] did not make<br>
&gt;&gt; &gt; any<br>
&gt;&gt; &gt; difference either and nodetool join just told me that the nod=
es were<br>
&gt;&gt; &gt; already<br>
&gt;&gt; &gt; a part of the ring.<br>
&gt;&gt; &gt; I can&#39;t imagine this is how it *should* be behaving... is=
 there a piece<br>
&gt;&gt; &gt; I&#39;m<br>
&gt;&gt; &gt; missing in terms of getting one node to recognize the other a=
s being Up?<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; David McNelis<br>
&gt; Lead Software Engineer<br>
&gt; Agentis Energy<br>
&gt; <a href=3D"http://www.agentisenergy.com" target=3D"_blank">www.agentis=
energy.com</a><br>
&gt; o: <a href=3D"tel:630.359.6395" value=3D"+16303596395">630.359.6395</a=
><br>
&gt; c: <a href=3D"tel:219.384.5143" value=3D"+12193845143">219.384.5143</a=
><br>
&gt; A Smart Grid technology company focused on helping consumers of energy=
<br>
&gt; control an often under-managed resource.<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><font color=3D"#888888">--<br>
Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder of DataStax, the source for professional Cassandra support<br>
<a href=3D"http://www.datastax.com" target=3D"_blank">http://www.datastax.c=
om</a><br>
</font></blockquote></div><br><br clear=3D"all"><br>-- <br><b>David McNelis=
</b><div><font size=3D"1" color=3D"#666666">Lead Software Engineer</font></=
div><div><font size=3D"1" color=3D"#666666">Agentis Energy</font></div><div=
><font size=3D"1" color=3D"#666666"><a href=3D"http://www.agentisenergy.com=
" target=3D"_blank">www.agentisenergy.com</a></font></div>
<div><span style=3D"font-size:x-small;color:rgb(102, 102, 102)">o: 630.359.=
6395</span></div><div><span style=3D"font-size:x-small;color:rgb(102, 102, =
102)">c: 219.384.5143</span></div><div><span style=3D"font-size:x-small;col=
or:rgb(102, 102, 102)"><br>
</span></div><div><span style=3D"font-family:&#39;Helvetica Neue&#39;, Helv=
etica, Arial, sans-serif;line-height:18px"><font color=3D"#666666" size=3D"=
1"><i>A Smart Grid technology company focused on helping consumers of energ=
y control an often under-managed resource.</i></font></span></div>
<div><br></div><br>
</div>

--00504502d7577531c404a67652bf--