hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Puz <npuz...@me.com>
Subject Question about WAL writes after region server "soft failures"
Date Fri, 07 Sep 2012 19:19:03 GMT
I'm new to HBase and HDFS and have a question about what happens when failure is detected and
a new region server takes over a region. If the old region server hasn't really failed and
"comes back" will it still accept writes?

Here's a specific sequence of events: 
1) region R is currently being served by region server RS1. 
2) RS1 hangs for some reason (long GC, network hiccup, etc)
3) the region master gets notified that RS1 is down so it splits logs and reassigns. Looking
at the code splitting logs renames the log directory so if RS1 tries to create a new log file
it will fail. 
4) region server RS2 is assigned the region, replays the log, and all is well. 
5) RS1 comes back to life. 

After 5 happens:
- if it had inflight requests will it write the to the WAL and eventually flush the memtables?
- if it gets new requests will it service them as long as it is still appending to the same
block in the WAL file?

One way to prevent the clients getting acks would be to set the client timeout to be less
than the zookeeper session timeout (zookeeper.session.timeout) which seems like a logical
thing to do.

But even if the timeouts were such the client got a timeout are there scenarios when the edits
would be readable by other clients? (say if that log file was rescanned)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message