cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandru Sicoe <adsi...@gmail.com>
Subject Re: repair never finishing 1.0.7
Date Tue, 26 Jun 2012 06:16:47 GMT
Hi Andras,

I am not using a VPN. The system has been running successfully in this
configuration for a couple of weeks until I noticed the repair is not
working.

What happens is that I configure the IP Tables of the machine on each
Cassandra node to forward packets that are sent to any of the IPs in the
other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The
gateway does the NAT sending the packets on the other side to the real
destination IP, having replaced the source IP with the initial sender's IP
(at least in my understanding of it).

What might be the problem given the configuration? How to fix this?

Cheers,
Alex

On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <
andras.szerdahelyi@ignitionone.com> wrote:

>
>   The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>
>  Ah, that sounds familiar. You don't mention if you are VPN'd or not.
> I'll assume you are not.
>
>  So, your nodes are behind network address translation - is that to say
> they advertise ( broadcast ) their internal or translated/forwarded IP to
> each other? Setting up a Cassandra ring across NAT ( without a VPN ) is
> impossible in my experience. Either the nodes on your local network won't
> be able to communicate with each other, because they broadcast their
> translated ( public ) address which is normally ( router configuration )
> not routable from within the local network, or the nodes broadcast their
> internal IP, in which case the "outside" nodes are helpless in trying to
> connect to a local net. On DC2 nodes/the node you issue the repair on,
> check for any sockets being opened to the internal addresses of the nodes
> in DC1.
>
>
>  regards,
> Andras
>
>
>
>  On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:
>
> Hello everyone,
>
>  I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about
> 300GB/node in the DC2.
>
>  The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>  I did a "nodetool repair" on a node in DC2 without any external load on
> the system.
>
>  It took 5 hrs to finish the Merkle tree calculations (which is fine for
> me) but then in the streaming phase nothing happens (0% seen in "nodetool
> netstats") and stays like that forever. Note: it has to stream to/from
> nodes in DC1!
>
>  I tried another time and still the same.
>
>  Looking around I found this thread
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
>  which seems to describe the same problem.
>
> The thread gives 2 suggestions:
> - a full cluster restart allows the first attempted repair to complete
> (haven't tested yet; this is not practical even if it works)
> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the
> problem
>
> Questions:
> 1) How can I make sure that the JIRA issue above is my real problem? (I
> see no errors or warns in the logs; no other activity)
> 2) What should I do to make the repairs work? (If the JIRA issue is the
> problem, then I see there is a fix for it in Version 1.0.11 which is not
> released yet)
>
> Thanks,
> Alex
>
>
>

Mime
View raw message