hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya.com>
Subject Re: Region server crashes when using replication
Date Tue, 29 Mar 2011 11:29:29 GMT
Thanks again J-D. I will avoid using stop_replication from now on.
As for the shell, JRuby (or even Java for that matter) is not really
our strong suit here, but I'll try to give it a look when I have some


On Mon, Mar 28, 2011 at 23:43, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
> Inline.
>> Thanks J-D!
>> I disabled replication because at the time, every time I started it
>> the entire cluster would shut itself down.
>> Any reason why the servers will not create the HLog immediately when
>> they receive the start_replicsation command?
> In the current code base replication cannot ask anything to the part
> responsible for the WAL. In any case start/stop replication wasn't
> built to do what you're trying to do, it's just a dirty kill switch.
>> Is there a less destructive way to stop and start the replication?
>> Will removing the peer yield better results? (By the way it would be
>> nice if the shell had a "show_peers" command.)
> There's a enable/disable command I have yet to implement :)
> Adding/removing the peer should do the trick too. I agree we need to
> list peers (that could be a nice first contribution wink wink).
>> 1) it seems that the servers crash if they can't talk to the peer ZK ensemble, which
is really a huge problem.
> Like we previously discussed, this only happens when the region server
> starts and it's also very easy to fix (just catch the right
> exception).
>> 2) I can't be certain when will the HLogs actually start being written unless I restart
the entire secondary cluster after reversing the replication direction.
> That's when you use star/stop replication, which like I said isn't
> designed to do what you want to do. Adding/removing the peers will
> work correctly in this case.

View raw message