hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin O'dell" <kevin.od...@cloudera.com>
Subject Re: HBase region server failure issues
Date Tue, 15 Apr 2014 16:39:33 GMT
Andrew,

  I agree, there is definitely a chance HDFS doesn't have an extra 3GB of
NN heap to squeak out for HBase.  It would be interesting to check in with
the Flurry guys and see what their NN pressure looks like.  As clusters
become more multi-tenant HDFS pressure could become a real concern.  I have
not seen too many clusters that have a ton of files and are choking the NN
into large GC pauses.  Usually, the end user is doing something wrong and
we can use something similar to HAR to help clean up some of the FS.


On Tue, Apr 15, 2014 at 12:29 PM, Andrew Purtell <apurtell@apache.org>wrote:

> You'd probably know better than I Kevin but I'd worry about the
> 1000*1000*32 case, where HDFS is as (over)committed as the HBase tier.
>
>
> On Tue, Apr 15, 2014 at 9:26 AM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
>
> > In general I have never seen nor heard of Federated Namespaces in the
> wild,
> > so I would be hesitant to go down that path.  But you know for "Science"
> I
> > would be interested in seeing how that worked out.  Would we be looking
> at
> > 32 WALs per region?  At a large cluster with 1000nodes, 100 regions per
> > node, and a WAL per region(I like easy math):
> >
> > 1000*100*32= 3.2 million files for WALs  This is not ideal, but it is not
> > horrible if we are using 128MB block sizes etc.
> >
> > I feel like I am missing something above though.  Thoughts?
> >
> >
> > On Tue, Apr 15, 2014 at 12:20 PM, Andrew Purtell <apurtell@apache.org
> > >wrote:
> >
> > > # of WALs as roughly spindles / replication factor seems intuitive.
> Would
> > > be interesting to benchmark.
> > >
> > > As for one WAL per region, the BigTable paper IIRC says they didn't
> > because
> > > of concerns about the number of seeks in the filesystems underlying GFS
> > and
> > > because it would reduce the effectiveness of group commit throughput
> > > optimization. If WALs are backed by SSD certainly the first
> consideration
> > > no longer holds. We also had a global HDFS file limit to contend with.
> I
> > > know HDFS is incrementally improving the scalabilty of a namespace, but
> > > this is still an active consideration. (Or we could try partitioning a
> > > deploy over a federated namespace? Could be "interesting". Has anyone
> > tried
> > > that? I haven't heard.)
> > >
> > >
> > >
> > > On Tue, Apr 15, 2014 at 7:11 AM, Jonathan Hsieh <jon@cloudera.com>
> > wrote:
> > >
> > > > It makes sense to have as many wals as # of spindles / replication
> > factor
> > > > per machine.  This should be decoupled from the number of regions on
> a
> > > > region server.  So for a cluster with 12 spindles we should likely
> have
> > > at
> > > > least 4 wals (12 spindles / 3 replication factor), and need to do
> > > > experiments to see if going to 8 or some higher number makes sense
> (new
> > > wal
> > > > uses a disruptor pattern which avoids much contention on individual
> > > > writes).   So with your example, your 1000 regions would get sharded
> > into
> > > > the 4 wals which would maximize io throughput, disk utilization, and
> > > reduce
> > > > time for recovery in the face of failure.
> > > >
> > > > In the case of an SSD world, it makes more sense to have one wal per
> > node
> > > > once we have decent HSM support in HDFS.  The key win here will be in
> > > > recovery time -- if any RS goes down we only have to replay a regions
> > > edits
> > > > and not have to split or demux different region's edits.
> > > >
> > > > Jon.
> > > >
> > > >
> > > > On Mon, Apr 14, 2014 at 10:37 PM, Vladimir Rodionov
> > > > <vladrodionov@gmail.com>wrote:
> > > >
> > > > > Todd, how about 300 regions with 3x replication?  Or 1000 regions?
> > This
> > > > is
> > > > > going to be 3000 files. on HDFS. per one RS. When I said that it
> does
> > > not
> > > > > scale, I meant that exactly that.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // HBase Tech Lead, Software Engineer, Cloudera
> > > > // jon@cloudera.com // @jmhsieh
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
> >
> >
> > --
> > Kevin O'Dell
> > Systems Engineer, Cloudera
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message