Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of todd@cloudera.com designates
 209.85.215.54 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAAg3a2oN81oKA=nCEWHw2PhdNAkW6chFhuwPP=4XdK00_M+nbQ@mail.gmail.com>
References: 
 <CAFB=OSy9eCyLPGvYgZHS8X+oaXrqzXA5JL-qmABbKb7Au5XDpA@mail.gmail.com>
 <CALte62ywaFt7rMXkG_HugqQNLpnEv7Ku-TwyVJ-xQ=Le16ToyQ@mail.gmail.com>
 <CAAg3a2rAEpN3=oQOPF7yTeH77F78kDAAC9rkJ8E5hWedK8-jtg@mail.gmail.com>
 <CADY20s5jH3MSLoT8NtohOZg7csZpVGVC+7Bpn5Zb=UCzbsYfQQ@mail.gmail.com>
 <CAAg3a2oN81oKA=nCEWHw2PhdNAkW6chFhuwPP=4XdK00_M+nbQ@mail.gmail.com>
From: Todd Lipcon <todd@cloudera.com>
Date: Mon, 14 Apr 2014 20:22:01 -0700
Message-ID: 
 <CADY20s4nq7QJvuxq3EPUdJD66eo_VL=2hszW0m-TiBziypFJ1Q@mail.gmail.com>
Subject: Re: HBase region server failure issues
To: dev <dev@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a11c3f020dc3ae404f70c4eed

--001a11c3f020dc3ae404f70c4eed
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Apr 14, 2014 at 6:32 PM, Vladimir Rodionov
<vladrodionov@gmail.com>wrote:

> *On the other hand, 95% of HBase users don't actually configure HDFS to
> fsync() every edit. Given that, the random writes aren't actually going to
> cause one seek per write -- they'll get buffered up and written back
> periodically in a much more efficient fashion.*
>
> Todd, this is in theory. Reality is different. 1 writer is definitely more
> efficient than 100. This won't scale well.
>

I'd actually disagree. 100 is probably significantly faster than 1, given
that most machines have 12 spindles. So, yes, you'd be multiplexing 8 or so
logs per spindle, but even 100 logs only requires a few hundred MB worth of
buffer cache in order to get good coalescing of writes into large physical
IOs.

If memory is really constrained on your machine, you'll probably get some
throughput collapse as you enter some really inefficient dirty throttling,
but so long as you leave a few GB unallocated, I bet the reality is much
closer to what I said than you might think.

-Todd


>
>
> On Mon, Apr 14, 2014 at 6:20 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
> > On the other hand, 95% of HBase users don't actually configure HDFS to
> > fsync() every edit. Given that, the random writes aren't actually going
> to
> > cause one seek per write -- they'll get buffered up and written back
> > periodically in a much more efficient fashion.
> >
> > Plus, in some small number of years, I believe SSDs will be available on
> > most server machines (in a hybrid configuration) so the seeks will cost
> > less even with fsync on.
> >
> > -Todd
> >
> >
> > On Mon, Apr 14, 2014 at 4:54 PM, Vladimir Rodionov
> > <vladrodionov@gmail.com>wrote:
> >
> > > I do not think its a good idea to have one WAL file per region. All WAL
> > > file idea is based on assumption that  writing data sequentially
> reduces
> > > average latency and increases total throughput. This is no longer a
> case
> > in
> > > a one WAL file per region approach, you may have hundreds active
> regions
> > > per RS and all sequential writes become random ones and random IO for
> > > rotational media is very bad, very bad.
> > >
> > > -Vladimir Rodionov
> > >
> > >
> > >
> > > On Mon, Apr 14, 2014 at 2:41 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > There is on-going effort to address this issue.
> > > >
> > > > See the following:
> > > > HBASE-8610 Introduce interfaces to support MultiWAL
> > > > HBASE-10378 Divide HLog interface into User and Implementor specific
> > > > interfaces
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu <csoroiu@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > My name is Claudiu Soroiu and I am new to hbase/hadoop but not new
> to
> > > > > distributed computing in FT/HA environments and I see there are a
> lot
> > > of
> > > > > issues reported related to the region server failure.
> > > > >
> > > > > The main problem I see it is related to recovery time in case of a
> > node
> > > > > failure and distributed log splitting. After some tunning I managed
> > to
> > > > > reduce it to 8 seconds in total and for the moment it fits the
> needs.
> > > > >
> > > > > I have one question: *Why there is only one WAL file per region
> > server
> > > > and
> > > > > not one WAL per region itself? *
> > > > > I haven't found the exact answer anywhere, that's why i'm asking on
> > > this
> > > > > list and please point me to the right direction if i missed the
> list.
> > > > >
> > > > > My point is that eliminating the need of splitting a log in case of
> > > > failure
> > > > > reduces the downtime for the regions and the only delay that we
> will
> > > see
> > > > > will be related to transferring data over network to the region
> > servers
> > > > > that will take over the failed regions.
> > > > > This is feasible only if having multiple WAL's per Region Server
> does
> > > not
> > > > > affect the overall write performance.
> > > > >
> > > > > Thanks,
> > > > > Claudiu
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


-- 
Todd Lipcon
Software Engineer, Cloudera

--001a11c3f020dc3ae404f70c4eed--