hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: hbase architecture question
Date Fri, 17 Jun 2011 20:55:01 GMT
The pre-splitting question is relatively easy.  There is an optional argument in the create
table command that takes an array of keys.  These keys are used as the begin region keys.
 How big should the splits be?  That is a little harder.  The rule of thumb is < 1000 regions
per server.  You want to change your region size (in the hbase-site.xml configuration file)
to optimize the region size for your workload.  If you are mostly doing writes and sequential
scans, and very rarely doing random reads, then you want to have large regions (say 4GB).
 That should give you a rough estimate of about 1000 regions total for your data.  Which is
only 100 regions per region server for a small 10 node cluster.

Dave


-----Original Message-----
From: Hiller, Dean x66079 [mailto:dean.hiller@broadridge.com] 
Sent: Friday, June 17, 2011 1:28 PM
To: user@hbase.apache.org
Subject: RE: hbase architecture question

How do you pre-split tables and how big should the splits be?  We will be doing a 3 terabyte
load into hbase in the near future.  

We have raw files spit out from our Sybase that I can load once into hadoop dfs so we can
wipe hbase, and reload the data into hbase on every run of our prototype

Is anyone else doing prototypes with massive data and need to clean out and restore their
data to the database like we have to do to prove this works?

Thanks,
Dean

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Wednesday, June 15, 2011 11:42 AM
To: user@hbase.apache.org
Subject: Re: hbase architecture question

Inline.

J-D

On Tue, Jun 14, 2011 at 6:48 PM, Sam Seigal <selekt86@yahoo.com> wrote:
> Hi All,
>
> I had some questions about the hbase architecture that I am a little
> confused about.
>
> After doing reading over the internet / HBase book etc, my understanding of
> regions is the following ->
>
> When the cluster initially starts up (with no data), the regionservers come
> online.
>
> When the data starts flowing into HBase, the data starts being written into
> a single region. Each region is composed of a memstore, storefiles and a
> WAL.

There's only one WAL per region server and one memstore per family per
region. See http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

> As memstores get filled up, they are written to storefiles on disk. A
> major compaction combines these storefiles together into a single store
> file. Storefiles/memstores exist on a per column family basis.
>
> What I do not understand is the following - the StoreFiles for a particular
> region/column family are written into HDFS as an HFile. This HFile then
> exists in HDFS as chunks (usually of 64 MB size) across the HDFS data nodes.
> When the region "splits", what happens to these store files ?

They are all split into two during a compaction after the split by
each daughter region.

>
> If a region is already split into multiple store files and a compaction runs
> and a split was supposed to occur, the compaction process will only compact
> half of the files ?

You can't compact and split at the same time.

> If a region only has a single store file, how is the data split into two ?
> Is half of the data written into a separate file and then the original file
> truncated ? Isn't this an expensive operation ?

Like I said previously, the daughter regions will split the files into
two new files each. It uses disk yeah, but you shouldn't have more
than a few splits per day. That's also one of the reason why we
recommend pre-splitting tables when doing an initial insert.

>
> Secondly, what does it mean to "move" a region from one regionserver to
> another. Since a region is composed of a set of HFiles on the HDFS
> filesystem, and can potentially exist anywhere across the datanode cluster,
> is any data physically being moved from one machine to another ? or is it
> just that the regionserver that the region has "moved" to takes control of
> the file on HDFS that needs to be written to ?

The responsibility is moved.

>
> I guess what I am not being able to fully understand is how is HBase
> effectively using HDFS when it comes to region distribution. If a
> regionserver is hosting a hot region whose blocks sit on a certain set of
> datanodes, how does moving this region to another regionserver help ? The
> traffic ultimately would land on the same set of datanodes holding the
> different chunks of the region .. shouldn't Hbase then be doing load
> balancing based on where the regions are physically located on the datanodes
> as opposed to where they are *logically* located ?

It's more complicated than that.

When you write to HDFS, the first block always goes to the local
datanode (if it exists) and the two others will go elsewhere. It means
that a single bigish store file can be spread on almost all nodes in a
cluster. Moving the region effectively forces the DFSClient to read
from different locations rather than just the local one.

Another thing to keep in mind is flushing/compacting which should
eventually move the data local to the new region location.

Then maybe it's not even the IO that's causing issue, maybe it's
because the region is trashing the block cache or requires a lot of
CPU for different reasons or is being written to a lot and keeps
flushing and compacting.

Even then, it's not ideal. Load balancing is still very dumb in HBase
and there's a lot more work to do. The kind of work I'd like to see a
few phd students picking up :)

>
> Please let me know if I am missing something truly fundamental here.

Nothing fundamental, but a lot of the answers you were seeking can be
found in the code. It's actually quite easy to read.

> Sam
>
This message and any attachments are intended only for the use of the addressee and
may contain information that is privileged and confidential. If the reader of the 
message is not the intended recipient or an authorized representative of the
intended recipient, you are hereby notified that any dissemination of this
communication is strictly prohibited. If you have received this communication in
error, please notify us immediately by e-mail and delete the message and any
attachments from your system.



Mime
View raw message