incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McNelis <dmcne...@agentisenergy.com>
Subject Re: Restarting cluster
Date Fri, 24 Jun 2011 13:20:19 GMT
Running on Centos.

We had a massive power failure and our UPS wasn't up to 48 hours without
power...

In this situation the IP addresses have all stayed the same.  I can still
connect to the "other" node from cli, so I don't think its an issue where
the iptables settings weren't saved and started blocking traffic.

In terms of the log files, the only related line from the log files is
saying:

 INFO [main] 2011-06-24 07:48:44,750 StorageService.java (line 382) Loading
persisted ring state
 INFO [main] 2011-06-24 07:48:44,757 StorageService.java (line 418) Starting
up server gossip

When I turn on debugging and restart the non-seed node I get this line:
DEBUG [WRITE-/192.168.80.XXX] 2011-06-24 08:04:48,798
OutboundTcpConnection.java (line 161) attempting to connect to
/192.168.80.XXX

But no errors after it.


On Fri, Jun 24, 2011 at 7:58 AM, Sasha Dolgy <sdolgy@gmail.com> wrote:

> Normally, no.  What you've done is fine.  What is the environment?
>
> On amazon EC2 for example, the instance could have crashed, a new one
> is brought online and has a different internal IP ...
>
> in the cassandra/logs/system.log are there any messages on the 2nd
> node and how it relates to the seed node?
>
> On Fri, Jun 24, 2011 at 2:49 PM, David McNelis
> <dmcnelis@agentisenergy.com> wrote:
> > I am running 0.8.0 on CentOS.  I have a 2 nodes in my cluster, one is a
> > seed, the other is autobootstrapped.
> > After having an unexpected shutdown of both of the physical machines I am
> > trying to restart the cluster.  I first started the seed node, it went
> > through the normal startup process and finished without error.  Once that
> > was complete I started the second node, again no errors in the log as it
> was
> > starting, it started the gossip server, ect.
> > However when I look at the ring using nodetool, both machines  show their
> > own status as up, then show the other machine as Down with a state of
> Normal
> > and a load of ?.  I have tried restarting the individual nodes in
> different
> > orders, waiting a while after restarting a node, but still the 'other'
> node
> > always has a status of "down".  nodetool repair [keyspace] did not make
> any
> > difference either and nodetool join just told me that the nodes were
> already
> > a part of the ring.
> > I can't imagine this is how it *should* be behaving... is there a piece
> I'm
> > missing in terms of getting one node to recognize the other as being Up?
>



-- 
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com
o: 630.359.6395
c: 219.384.5143

*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*

Mime
View raw message