hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudiu Soroiu <csor...@gmail.com>
Subject Re: HBase region server failure issues
Date Wed, 16 Apr 2014 04:35:13 GMT
Yes, overall the second WAL would contain the same data but differently
distributed.
A server will have in the second WAL data from the regions that it will
take over if they fail.
It is just an idea as it might not be good to duplicate the data across the
cluster.




On Wed, Apr 16, 2014 at 12:36 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Would the second WAL contain the same contents as the first ?
>
> We already have the code that adds interceptor on the calls to the
> namenode#getBlockLocations so that blocks on the same DN as the dead RS are
> placed at the end of the priority queue..
> See addLocationsOrderInterceptor()
> in hbase-server/src/main/java/org/apache/hadoop/hbase/fs/HFileSystem.java
>
> This is for faster recovery in case regionserver is deployed on the same
> box as the datanode.
>
>
> On Tue, Apr 15, 2014 at 1:43 PM, Claudiu Soroiu <csoroiu@gmail.com> wrote:
>
> > First of all, thanks for the clarifications.
> >
> > **how about 300 regions with 3x replication?  Or 1000 regions? This
> > is going to be 3000 files. on HDFS. per one RS.**
> >
> > Now i see that the trade-off is how to reduce the recovery time without
> > affecting the overall performance of the cluster.
> > Having too many WAL's affects the write performance.
> > Basically multiple WAL's might improve the process but the number of
> WAL's
> > should be relatively small.
> >
> > Would it be feasible to know ahead of time where a region might activate
> in
> > case of a failure and have for each region server a second WAL file
> > containing backup edits?
> > E.g. If machine B crashes then a region will be assigned to node A,  one
> to
> > node C, etc.
> > Also another view would be: Server A will backup a region from Server B
> if
> > crashes, a region from server C, etc. Basically this second WAL will
> > contain the data that is needed to fast recover a crashed node.
> > This adds additional redundancy and some degree of complexity to the
> > solution but ensures data locality in case of a crash and faster
> recovery.
> >
> > **What did you do Claudiu to get the time down?**
> >
> >  Decreased the hdfs block size to 64 megs for now.
> >  Enabled settings to avoid hdfs stale nodes.
> >  Cluster I tested this was relatively small - 10 computers.
> >  I did tuning for zookeeper sessions to keep the heartbeat at 5 seconds
> for
> > the moment, and plan to decrease this value.
> >  At this point dfs.heartbeat.interval is set at the default 3 seconds,
> but
> > this I also plan to decrease and perform a more intensive test.
> >  (Decreasing the times is based on the experience with our current system
> > configured at 1.2 seconds and didn't had any issues even under heavy
> loads,
> > obviously stop the world GC times should be smaller that the heartbeat
> > interval)
> >  And I remember i did some changes for the reconnect intervals of the
> > client to allow him to reconnect to the region as fast as possible.
> >  I am in an early stage of experimenting with hbase but there are lot of
> > things to test/check...
> >
> >
> >
> >
> > On Tue, Apr 15, 2014 at 11:03 PM, Vladimir Rodionov
> > <vladrodionov@gmail.com>wrote:
> >
> > > *We also had a global HDFS file limit to contend with*
> > >
> > > Yes, we have been seeing this from time to time in our production
> > clusters.
> > > Periodic purging of old files helps, but the issue is obvious.
> > >
> > > -Vladimir Rodionov
> > >
> > >
> > > On Tue, Apr 15, 2014 at 11:58 AM, Stack <stack@duboce.net> wrote:
> > >
> > > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <csoroiu@gmail.com>
> > > wrote:
> > > >
> > > > > ....
> > > >
> > > > After some tunning I managed to
> > > > > reduce it to 8 seconds in total and for the moment it fits the
> needs.
> > > > >
> > > >
> > > > What did you do Claudiu to get the time down?
> > > > Thanks,
> > > > St.Ack
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message