hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Puri, Aseem" <Aseem.P...@Honeywell.com>
Subject RE: Some HBase FAQ
Date Tue, 14 Apr 2009 05:48:37 GMT

-----Original Message-----
From: Erik Holstad [mailto:erikholstad@gmail.com] 
Sent: Monday, April 13, 2009 9:47 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Some HBase FAQ

On Mon, Apr 13, 2009 at 7:12 AM, Puri, Aseem
<Aseem.Puri@honeywell.com>wrote:

> Hi
>
>            I am new HBase user. I have some doubts regards
> functionality of HBase. I am working on HBase, things are going fine
but
> I am not clear how are things happening. Please help me by answering
> these questions.
>
>
>
> 1.      I am inserting data in HBase table and all regions get
balanced
> across various Regionservers. But what will happens when data
increases
> and there is not enough space in Regionservers to accommodate all
> regions. So I will like this that some regions in Regionserver and
some
> are at HDFS but not on Regionserver or HBase Regioservers stop taking
> new data?
>
Not really sure what you mean here, but if you are asking what to do
when
you are
running out of disk space on the regionservers, the answer is add
another
machine
or two.

--- I want ask that HBase RegionServer store regions data on HDFS. So
when HBase master starts it loads all region data from HDFS to
regionserver. So what will the scenario if there is not enough space in
regionservers to accommodate new data? Is some regions swapped out from
regionserver to create space for new regions and when needed swaps in
regions to regionserver from HDFS. Or something else will happen. 

>
>
>
> 2.      When I insert data in HBase table, 3 to 4 mapfiles are
generated
> for one category, but after some time all mapfiles combines as one
file.
> Is this we call minor compaction actually?
>
When all current mapfiles and memcache are combined into one files, this
is called major compaction, see BigTable paper for more details.

>
>
>
> 3.      For my application where I will use HBase will have updates in
a
> table frequently. Should is use some other database as a intermediate
to
> store data temporarily like MySQL and then do bulk update on HBase or
> should I directly do updates on HBase. Please tell which technique
will
> be more optimized in HBase?
>
HBase is fast for reads which has so far been the main focus of the
development, with
0.20 we can hopefully add even fast random reading to it to make it a
more
well rounded
system. Is HBase too slow for you today when writing to it and what are
your
requirements?

---- Basically I put this question for writing operation. Not any
complex requirement. I want your suggestion on that what technique
should I follow for write operation:

a. If there is some update I should store data temporarily in MySQL and
then do bulk update on HBase

b. As if there is an update I should directly update on HBase instead of
writing it in MySQL and after some time doing bulk update on HBase. 

What you say, what approach is more optimized?

Mime
View raw message