hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Q <bill.q....@gmail.com>
Subject Re: Questions about HBase load balancing and HFile
Date Mon, 20 Jan 2014 14:17:09 GMT
Hi Ted and Bharath,
Thanks a lot for the replies.

For question #1, if there is a RS is under heavy load by serving to hot
regions, the HMaster will move one of the two regions to another RS, or
HMaster will split both of them and move the newly crated halves to other
RSs?

For question #3, does this mean that a HFile has many 64k blocks, but
itself is around 64M (or 128M)?


Many thanks.


Bill


On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <bharathv@cloudera.com
> wrote:

> For question #3, The block size Lars talks about is the blocksize inside a
> HFile which is different from HDFS block size. Look at
> http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
> facilitate random access to data so that we can skip unnecessary disk
> blocks while gets/scans. Smaller the hfile block size better is the random
> read performance. You can see the detailed hfile layout in that link.
>
> For question #4, You are correct, since the data resides on HDFS, each
> region server has access to all the storefiles (they just use hdfs api to
> read them). The reason they are still available after a (RS+datanode) crash
> is because of the replication in hdfs. The store files still have valid
> replicas and namenode tries to maintain the replication factor by
> re-replicating them eventually.
>
>
> On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > For question #1, there is load balancer in HMaster which does the job of
> > balancing region load.
> >
> > For number 2, the daughter regions stay on the same server as the parent
> > after split. Later one or both of them may be moved to other region
> servers.
> >
> > Cheers
> >
> > On Jan 19, 2014, at 10:27 PM, Bill Q <bill.q.hdp@gmail.com> wrote:
> >
> > > Hi,
> > > I am trying to get more information about HBase. I would appreciate
> some
> > > answers to these few questions. Thanks a lot.
> > >
> > > 1. About load balancing: does HMaster monitor overloaded or low loaded
> > > HRegionServer, and move some regions from the hot HRegionServer to low
> > > loaded ones (with or without add new servers into the cluster,
> > > respectively)?
> > >
> > > 2. About region splitting: when splitting a region, will the newly
> > created
> > > regions stay on the current HRegionSever, or will HMaster assign some
> new
> > > HRegionServers to take the newly created two regions?
> > >
> > > 3. About HFile size: Lars mentioned here
> > >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > the HFile size is default to 64k. How does this work while the default
> > HDFS
> > > block is 64M/128M? Would the small HFile size waste lots of space on
> > HDFS?
> > >
> > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > assign
> > > a new HRegionServer to take its place. But does this new HRegionServer
> > > should have access to the storeFiles? I assumed that's how it works by
> > > using HDFS's data replication. But after some readings, I got confused.
> > It
> > > seems that the new HRegionServer can work without the storeFiles data
> at
> > > local. How does this work at all?
> > >
> > > Many thanks.
> > >
> > >
> > > Bill
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message