hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Ross <gregr...@ngmoco.com>
Subject Re: long garbage collecting pause
Date Tue, 02 Oct 2012 15:32:13 GMT
Thanks for the suggestions.

I was attempting to tune the GC via mapred.child.java.opts in the job's
Oozie config instead of in hbase-env.sh. I think this is why my efforts
were to no avail. It was likely having no effect on the read/write
performance. Is there any way of specifying job-specific HBase parameters
instead of globally setting them in hbase-env.sh?

The cluster has 175 nodes. Each with 48GB of RAM. The overall data input
size is 7TB and I pre-split the table into initially 30 regions, then 100
in another attempt. Each job runs upon 700GB chunks of the data. I used
RegionSplitter to create and condition the table and therefore there's
currently no compression. I'm thinking to recreate the table and 'alter' it
with LZO compression before attempting the jobs again.



On Tue, Oct 2, 2012 at 7:20 AM, Damien Hardy <dhardy@viadeoteam.com> wrote:

> Hello
> 2012/10/2 Marcos Ortiz <mlortiz@uci.cu>
>> Another thing that I´m seeing is that one of your main process is
>> compaction,
>> so you can optimize all this inceasing the size of your regions (by
>> defaulf the size of a
>> region is 256 MB), but you will have in your hands a "split/compaction
>> storm" like
>> Lars called them on his book.
> Actually it seams like the default value for hbase.hregion.max.filesize in
> 0.92 was increased up to 1Go.
> http://hbase.apache.org/book/upgrade0.92.html#d2051e266
> But you can set it to more (max is 20Go) and split manually.
> http://hbase.apache.org/book/important_configurations.html#bigger.regions
> Cheers,
> --
> Dam


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message