Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of vladrodionov@gmail.com
 designates 74.125.82.46 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALte62ywaFt7rMXkG_HugqQNLpnEv7Ku-TwyVJ-xQ=Le16ToyQ@mail.gmail.com>
References: 
 <CAFB=OSy9eCyLPGvYgZHS8X+oaXrqzXA5JL-qmABbKb7Au5XDpA@mail.gmail.com>
	<CALte62ywaFt7rMXkG_HugqQNLpnEv7Ku-TwyVJ-xQ=Le16ToyQ@mail.gmail.com>
Date: Mon, 14 Apr 2014 16:54:23 -0700
Message-ID: 
 <CAAg3a2rAEpN3=oQOPF7yTeH77F78kDAAC9rkJ8E5hWedK8-jtg@mail.gmail.com>
Subject: Re: HBase region server failure issues
From: Vladimir Rodionov <vladrodionov@gmail.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Content-Type: multipart/alternative; boundary=f46d043c086420a8e604f7096750

--f46d043c086420a8e604f7096750
Content-Type: text/plain; charset=UTF-8

I do not think its a good idea to have one WAL file per region. All WAL
file idea is based on assumption that  writing data sequentially reduces
average latency and increases total throughput. This is no longer a case in
a one WAL file per region approach, you may have hundreds active regions
per RS and all sequential writes become random ones and random IO for
rotational media is very bad, very bad.

-Vladimir Rodionov


On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> There is on-going effort to address this issue.
>
> See the following:
> HBASE-8610 Introduce interfaces to support MultiWAL
> HBASE-10378 Divide HLog interface into User and Implementor specific
> interfaces
>
> Cheers
>
>
> On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <csoroiu@gmail.com> wrote:
>
> > Hi all,
> >
> > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to
> > distributed computing in FT/HA environments and I see there are a lot of
> > issues reported related to the region server failure.
> >
> > The main problem I see it is related to recovery time in case of a node
> > failure and distributed log splitting. After some tunning I managed to
> > reduce it to 8 seconds in total and for the moment it fits the needs.
> >
> > I have one question: *Why there is only one WAL file per region server
> and
> > not one WAL per region itself? *
> > I haven't found the exact answer anywhere, that's why i'm asking on this
> > list and please point me to the right direction if i missed the list.
> >
> > My point is that eliminating the need of splitting a log in case of
> failure
> > reduces the downtime for the regions and the only delay that we will see
> > will be related to transferring data over network to the region servers
> > that will take over the failed regions.
> > This is feasible only if having multiple WAL's per Region Server does not
> > affect the overall write performance.
> >
> > Thanks,
> > Claudiu
> >
>

--f46d043c086420a8e604f7096750--