hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Puz <npuz...@me.com>
Subject Re: Question about WAL writes after region server "soft failures"
Date Fri, 07 Sep 2012 20:03:52 GMT
Hi Jimmy, 
Thanks for the quick response. If the paused region server currently has the file open and
is writing to it (current stream open to data node -- actually local i guess) will the stream
be marked as unusable so the write to it fails? I guess maybe this is more of an HDFS question. 


On Sep 07, 2012, at 12:32 PM, Jimmy Xiang <jxiang@cloudera.com> wrote:

Hi Nick,

When the dead region server comes back, it won't be able to write data
to the WAL any more.
As the first thing of log splitting, the WAL folder for the dead
region server is renamed. When
the dead region server tries to write to the WAL, it will find the
file is not there any more.


On Fri, Sep 7, 2012 at 12:19 PM, Nick Puz <npuz.os@me.com> wrote:
> I'm new to HBase and HDFS and have a question about what happens when
> failure is detected and a new region server takes over a region. If the old
> region server hasn't really failed and "comes back" will it still accept
> writes?
> Here's a specific sequence of events:
> 1) region R is currently being served by region server RS1.
> 2) RS1 hangs for some reason (long GC, network hiccup, etc)
> 3) the region master gets notified that RS1 is down so it splits logs and
> reassigns. Looking at the code splitting logs renames the log directory so
> if RS1 tries to create a new log file it will fail.
> 4) region server RS2 is assigned the region, replays the log, and all is
> well.
> 5) RS1 comes back to life.
> After 5 happens:
> - if it had inflight requests will it write the to the WAL and eventually
> flush the memtables?
> - if it gets new requests will it service them as long as it is still
> appending to the same block in the WAL file?
> One way to prevent the clients getting acks would be to set the client
> timeout to be less than the zookeeper session timeout
> (zookeeper.session.timeout) which seems like a logical thing to do.
> But even if the timeouts were such the client got a timeout are there
> scenarios when the edits would be readable by other clients? (say if that
> log file was rescanned)
> Thanks,
> -Nick

  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message