hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase region server failure issues
Date Tue, 15 Apr 2014 16:20:21 GMT
# of WALs as roughly spindles / replication factor seems intuitive. Would
be interesting to benchmark.

As for one WAL per region, the BigTable paper IIRC says they didn't because
of concerns about the number of seeks in the filesystems underlying GFS and
because it would reduce the effectiveness of group commit throughput
optimization. If WALs are backed by SSD certainly the first consideration
no longer holds. We also had a global HDFS file limit to contend with. I
know HDFS is incrementally improving the scalabilty of a namespace, but
this is still an active consideration. (Or we could try partitioning a
deploy over a federated namespace? Could be "interesting". Has anyone tried
that? I haven't heard.)

On Tue, Apr 15, 2014 at 7:11 AM, Jonathan Hsieh <jon@cloudera.com> wrote:

> It makes sense to have as many wals as # of spindles / replication factor
> per machine.  This should be decoupled from the number of regions on a
> region server.  So for a cluster with 12 spindles we should likely have at
> least 4 wals (12 spindles / 3 replication factor), and need to do
> experiments to see if going to 8 or some higher number makes sense (new wal
> uses a disruptor pattern which avoids much contention on individual
> writes).   So with your example, your 1000 regions would get sharded into
> the 4 wals which would maximize io throughput, disk utilization, and reduce
> time for recovery in the face of failure.
> In the case of an SSD world, it makes more sense to have one wal per node
> once we have decent HSM support in HDFS.  The key win here will be in
> recovery time -- if any RS goes down we only have to replay a regions edits
> and not have to split or demux different region's edits.
> Jon.
> On Mon, Apr 14, 2014 at 10:37 PM, Vladimir Rodionov
> <vladrodionov@gmail.com>wrote:
> > Todd, how about 300 regions with 3x replication?  Or 1000 regions? This
> is
> > going to be 3000 files. on HDFS. per one RS. When I said that it does not
> > scale, I meant that exactly that.
> >
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // jon@cloudera.com // @jmhsieh

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message