hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Vissapragada <bhara...@cloudera.com>
Subject Re: Questions about HBase load balancing and HFile
Date Mon, 20 Jan 2014 06:49:44 GMT
For question #3, The block size Lars talks about is the blocksize inside a
HFile which is different from HDFS block size. Look at
http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
facilitate random access to data so that we can skip unnecessary disk
blocks while gets/scans. Smaller the hfile block size better is the random
read performance. You can see the detailed hfile layout in that link.

For question #4, You are correct, since the data resides on HDFS, each
region server has access to all the storefiles (they just use hdfs api to
read them). The reason they are still available after a (RS+datanode) crash
is because of the replication in hdfs. The store files still have valid
replicas and namenode tries to maintain the replication factor by
re-replicating them eventually.


On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For question #1, there is load balancer in HMaster which does the job of
> balancing region load.
>
> For number 2, the daughter regions stay on the same server as the parent
> after split. Later one or both of them may be moved to other region servers.
>
> Cheers
>
> On Jan 19, 2014, at 10:27 PM, Bill Q <bill.q.hdp@gmail.com> wrote:
>
> > Hi,
> > I am trying to get more information about HBase. I would appreciate some
> > answers to these few questions. Thanks a lot.
> >
> > 1. About load balancing: does HMaster monitor overloaded or low loaded
> > HRegionServer, and move some regions from the hot HRegionServer to low
> > loaded ones (with or without add new servers into the cluster,
> > respectively)?
> >
> > 2. About region splitting: when splitting a region, will the newly
> created
> > regions stay on the current HRegionSever, or will HMaster assign some new
> > HRegionServers to take the newly created two regions?
> >
> > 3. About HFile size: Lars mentioned here
> > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > the HFile size is default to 64k. How does this work while the default
> HDFS
> > block is 64M/128M? Would the small HFile size waste lots of space on
> HDFS?
> >
> > 4. About data locality: if a HRegionServer fails, the HMaster would
> assign
> > a new HRegionServer to take its place. But does this new HRegionServer
> > should have access to the storeFiles? I assumed that's how it works by
> > using HDFS's data replication. But after some readings, I got confused.
> It
> > seems that the new HRegionServer can work without the storeFiles data at
> > local. How does this work at all?
> >
> > Many thanks.
> >
> >
> > Bill
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message