hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Baldassari <jbaldass...@gmail.com>
Subject Re: Very long time between node failure and reasing of regions.
Date Mon, 26 Apr 2010 21:00:18 GMT
Hi Michal,

I'm not an HBase committer, but my organization uses HBase heavily in our
production environment, which processes requests in real-time.  I can share
with you a couple of our strategies and best practices for using HBase in
this type of environment:

1. We use an intermediate caching layer for higher availability and fewer
requests to HBase.  This caching layer consists of Ehcache with
read-through/write-through to Memcached.  If the data is not in ehcache or
memcached then we queue up a request to get the data from HBase in the
background.  This strategy reduces the load on HBase and protects us against
any availability/reliability problems with the HBase cluster.  If the data
is not available in cache we are still able to process the request.  Maybe
that is not the case with your application.

2. We would never intentionally shut down a region server in our production
environment and expect everything to be fine with HBase.  If we have to shut
down a region server, which happens rarely, we stop all requests to HBase
until we have completed our maintenance and all region servers are back
online.  Our caching layer that I mentioned above allows us to do this with
no interruption to service.  This strategy might be overly conservative, but
we'd rather not take any chances.

-James


2010/4/26 Michał Podsiadłowski <podsiadlowski@gmail.com>

> Hi Todd,
>
> Thanks for you input. Your words are making me sad though. I'm using
> 0.20.4 taken from trunk around beginning of April. Exact version I can
> tell you tomorrow.
> With respect to 1) we are only shutting down no even killing regions
> servers, datanodes are still working. This is not the first time we
> manage to break whole cluster with just shutting down regions servers.
>
> Cheers,
> Michal
>
>
> 2010/4/26 Todd Lipcon <todd@cloudera.com>:
> > Hi Michal,
> >
> > What version of HBase are you running?
> >
> > All currently released versions of HBase have known bugs with recovery
> under
> > crash scenarios, many of which have to do with the lack of a sync()
> feature
> > in released versions of HDFS.
> >
> > The goal for HBase 0.20.5, due out in the next couple of months, is to
> fix
> > all of these issues to achieve cluster stability under failure.
> >
> > I'm working full time on this branch, and happy to report that as of
> > yesterday I have a 40-threaded client which is inserting records into a
> > cluster where I am killing a region server once every 1-2 minutes, and it
> is
> > recovering completely and correctly through every failure. The test has
> been
> > running for about 24 hours, and no regions have been lost, etc.
> >
> > My next step is to start testing under 2-node failure scenarios, master
> > failure scenarios, etc.
> >
> > Regarding your specific questions:
> >
> > 1) When you have a simultaneous failure of 3 nodes, you will have blocks
> > become unavailable in the underlying HDFS. Thus, HBase has no recourse to
> be
> > able to continue operating correctly, since its data won't be accessible
> and
> > any edit logs writing to that set of 3 nodes will fail to append. Thus, I
> > don't think we can reasonably expect to do anything to recover from this
> > situation. We should shut down the cluster in such a way that, after HDFS
> > has been restored, we can restart HBase without missing regions, etc.
> There
> > are probably bugs here, currently, but is lower on the priority list
> > compared to more common scenarios.
> >
>
> > 2) When a region is being reassigned, it does take some time to recover.
> In
> > my experience, a loss of a region server hosting META does take about 2
> > minutes to fully reassign. The loss of a region server not holding META
> > takes about 1 minute to fully reassign. This is with a 1 minute ZK
> session
> > timeout. With shorter timeouts, you will detect failure faster, but more
> > likely to have false failure detections due to GC pauses, etc. We're
> working
> > on improving this for 0.21.
> >
> > Regarding the suitability of this for a real time workload, there are
> some
> > ideas floating around for future work that would make the regions
> available
> > very quickly in a readonly/stale data mode while the logs are split and
> > recovered. This is probably not going to happen in the short term, as it
> > will be tricky to do correctly, and there are more pressing issues.
> >
> > Thanks
> > -Todd
> >
> >
> >
> >
> >
> > 2010/4/26 Michał Podsiadłowski <podsiadlowski@gmail.com>
> >
> >>  Hi Edward,
> >>
> >> these are not good news for us. If under low load you get 30 seconds
> >> our 3 minutes are quite normal. Especially because your records are
> >> quite big and there is lots of removals and inserts. I just wonder if
> >> our use case scenarios are not in the sweet spot of hbase or hbase
> >> availability simply low. Do you have any knowledge about change to
> >> architecture in 0.21? As far as I can see partially problem is with
> >> dividing logs from dead data node to table files logs.
> >> Is there any way we could speed up recovery ? And can someone explain
> >> what happened when we shutdown 3/6 regions servers? Why cluster got
> >> into inconsistent state with so many missing regions? Is this so extra
> >> usual situation that hbase can't handle?
> >>
> >> Thanks,
> >> Michal
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message