hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: lack of region merge cause in_memory option trouble
Date Sun, 19 Sep 2010 14:37:26 GMT
Hi Jimmy,

IN_MEMORY may not mean what you think. It does not turn off disk persistence, flushing, etc.
It is a suggestion to the regionserver that all of the data for the region be retained in
block cache. 

Also, as I said before your test case is not really what the current TTL implementation targets.
If you want it to work better for you given such short TTLs, it may make sense to modify the
memstore to simply not flush values with short TTLs, if they will expire in a few minutes
or seconds. 

> The idea is that we are only interested in last 10 minute's data,
> as data gets older, it will be purged, and the amount of memory
> and disk usage will remain low. [...]

What is the anticipated data volume within that 10 minute window? Will it fit all in RAM on
a single server? Or perhaps a small cluster of servers?

The BigTable/HBase design targets large data scale, and the implementation is optimized for
that, a distributed, elastic, **persistent** sparse map with multidimensional keys. What you
are talking about here way on the other end of the spectrum, and persistence may not be something
you want. 

   - Andy

> From: Jinsong Hu <jinsong_hu@hotmail.com>
> Subject: lack of region merge cause in_memory option trouble
> To: user@hbase.apache.org
> Date: Friday, September 17, 2010, 2:53 PM
> Hi,
>  I was trying to find out if the hbase can be used in
> real-time processing scenario. In order to
> do so, I set the in_memory for a table to be true, and set
> the TTL for the table to 10 minuets.
> The data comes in chronnological order. I let the test to
> run for 1 day. The idea is that we are only
> interested in last 10 minute's data. as data gets older, it
> will be purged, and the amount of memory and disk usage will
> remain low.
>  What I found is that the region number continue to grow ,
> and overnight it created 46 regions. the HDFS shows it used
> 8.6G of disk space. This is one order of magnitude higher
> than what I estimate in the ideal case. The data rate that I
> am pumping is only 3 regions/hour. I would imagine that we
> will only have less than 3 regions in hbase for this kind of
> situation, and only 700M in terms of HDFS usage, regardless
> how long I run the test.
>  I understand that the region merge request is already
> filed. Does anybody know when that will be implemented ?
> 
> Jimmy. 
> 


      


Mime
View raw message