hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Very long time between node failure and reasing of regions.
Date Mon, 26 Apr 2010 20:21:39 GMT
Hi Michal,

What version of HBase are you running?

All currently released versions of HBase have known bugs with recovery under
crash scenarios, many of which have to do with the lack of a sync() feature
in released versions of HDFS.

The goal for HBase 0.20.5, due out in the next couple of months, is to fix
all of these issues to achieve cluster stability under failure.

I'm working full time on this branch, and happy to report that as of
yesterday I have a 40-threaded client which is inserting records into a
cluster where I am killing a region server once every 1-2 minutes, and it is
recovering completely and correctly through every failure. The test has been
running for about 24 hours, and no regions have been lost, etc.

My next step is to start testing under 2-node failure scenarios, master
failure scenarios, etc.

Regarding your specific questions:

1) When you have a simultaneous failure of 3 nodes, you will have blocks
become unavailable in the underlying HDFS. Thus, HBase has no recourse to be
able to continue operating correctly, since its data won't be accessible and
any edit logs writing to that set of 3 nodes will fail to append. Thus, I
don't think we can reasonably expect to do anything to recover from this
situation. We should shut down the cluster in such a way that, after HDFS
has been restored, we can restart HBase without missing regions, etc. There
are probably bugs here, currently, but is lower on the priority list
compared to more common scenarios.

2) When a region is being reassigned, it does take some time to recover. In
my experience, a loss of a region server hosting META does take about 2
minutes to fully reassign. The loss of a region server not holding META
takes about 1 minute to fully reassign. This is with a 1 minute ZK session
timeout. With shorter timeouts, you will detect failure faster, but more
likely to have false failure detections due to GC pauses, etc. We're working
on improving this for 0.21.

Regarding the suitability of this for a real time workload, there are some
ideas floating around for future work that would make the regions available
very quickly in a readonly/stale data mode while the logs are split and
recovered. This is probably not going to happen in the short term, as it
will be tricky to do correctly, and there are more pressing issues.

Thanks
-Todd





2010/4/26 Michał Podsiadłowski <podsiadlowski@gmail.com>

>  Hi Edward,
>
> these are not good news for us. If under low load you get 30 seconds
> our 3 minutes are quite normal. Especially because your records are
> quite big and there is lots of removals and inserts. I just wonder if
> our use case scenarios are not in the sweet spot of hbase or hbase
> availability simply low. Do you have any knowledge about change to
> architecture in 0.21? As far as I can see partially problem is with
> dividing logs from dead data node to table files logs.
> Is there any way we could speed up recovery ? And can someone explain
> what happened when we shutdown 3/6 regions servers? Why cluster got
> into inconsistent state with so many missing regions? Is this so extra
> usual situation that hbase can't handle?
>
> Thanks,
> Michal
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message