hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Puri, Aseem" <Aseem.P...@Honeywell.com>
Subject RE: Some HBase FAQ
Date Tue, 14 Apr 2009 06:56:50 GMT

Ryan,

Thanks for updating me, Also please tell me what will happen if is read
operation then required region is bring into RAM or not?

Thanks & Regards
Aseem Puri


-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com] 
Sent: Tuesday, April 14, 2009 12:23 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Some HBase FAQ

yes exactly.  The regionserver loads the index on start up in one go,
holds
it in ram - then it can use this index to do small specific reads from
HDFS.

I found that in hbase 0.20 I was using about 700kB/ram per 5m rows, 40
byte
values.

-ryan

On Mon, Apr 13, 2009 at 11:50 PM, Puri, Aseem
<Aseem.Puri@honeywell.com>wrote:

> Hi Ryan,
>
> It means Regionserver have only index file of regions but not the
actual
> data that is on HDFS.
>
> Thanks & Regards
> Aseem Puri
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Tuesday, April 14, 2009 12:16 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Some HBase FAQ
>
> HBase loads the index of the files on start-up, if you ran out of
memory
> for
> those indexes (which are a fraction of the data size), you'd crash
with
> OOME.
>
> The index is supposed to be a smallish fraction of the total data
size.
>
> I wouldn't run with less than -Xmx2000m
>
> On Mon, Apr 13, 2009 at 10:48 PM, Puri, Aseem
> <Aseem.Puri@honeywell.com>wrote:
>
> >
> > -----Original Message-----
> > From: Erik Holstad [mailto:erikholstad@gmail.com]
> > Sent: Monday, April 13, 2009 9:47 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Some HBase FAQ
> >
> > On Mon, Apr 13, 2009 at 7:12 AM, Puri, Aseem
> > <Aseem.Puri@honeywell.com>wrote:
> >
> > > Hi
> > >
> > >            I am new HBase user. I have some doubts regards
> > > functionality of HBase. I am working on HBase, things are going
fine
> > but
> > > I am not clear how are things happening. Please help me by
answering
> > > these questions.
> > >
> > >
> > >
> > > 1.      I am inserting data in HBase table and all regions get
> > balanced
> > > across various Regionservers. But what will happens when data
> > increases
> > > and there is not enough space in Regionservers to accommodate all
> > > regions. So I will like this that some regions in Regionserver and
> > some
> > > are at HDFS but not on Regionserver or HBase Regioservers stop
> taking
> > > new data?
> > >
> > Not really sure what you mean here, but if you are asking what to do
> > when
> > you are
> > running out of disk space on the regionservers, the answer is add
> > another
> > machine
> > or two.
> >
> > --- I want ask that HBase RegionServer store regions data on HDFS.
So
> > when HBase master starts it loads all region data from HDFS to
> > regionserver. So what will the scenario if there is not enough space
> in
> > regionservers to accommodate new data? Is some regions swapped out
> from
> > regionserver to create space for new regions and when needed swaps
in
> > regions to regionserver from HDFS. Or something else will happen.
> >
> > >
> > >
> > >
> > > 2.      When I insert data in HBase table, 3 to 4 mapfiles are
> > generated
> > > for one category, but after some time all mapfiles combines as one
> > file.
> > > Is this we call minor compaction actually?
> > >
> > When all current mapfiles and memcache are combined into one files,
> this
> > is called major compaction, see BigTable paper for more details.
> >
> > >
> > >
> > >
> > > 3.      For my application where I will use HBase will have
updates
> in
> > a
> > > table frequently. Should is use some other database as a
> intermediate
> > to
> > > store data temporarily like MySQL and then do bulk update on HBase
> or
> > > should I directly do updates on HBase. Please tell which technique
> > will
> > > be more optimized in HBase?
> > >
> > HBase is fast for reads which has so far been the main focus of the
> > development, with
> > 0.20 we can hopefully add even fast random reading to it to make it
a
> > more
> > well rounded
> > system. Is HBase too slow for you today when writing to it and what
> are
> > your
> > requirements?
> >
> > ---- Basically I put this question for writing operation. Not any
> > complex requirement. I want your suggestion on that what technique
> > should I follow for write operation:
> >
> > a. If there is some update I should store data temporarily in MySQL
> and
> > then do bulk update on HBase
> >
> > b. As if there is an update I should directly update on HBase
instead
> of
> > writing it in MySQL and after some time doing bulk update on HBase.
> >
> > What you say, what approach is more optimized?
> >
>

Mime
View raw message