hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick maillard <nicolas.maill...@fifty-five.com>
Subject Re: Hbase import Tsv performance (slow import)
Date Wed, 24 Oct 2012 09:23:14 GMT
Thanks for your help

I have taken my replication down to 2 but If I am not mistaken replication also
has the benefit of rendering the cluster more fault by duplicating info on
different nodes so that if one goes down data is note necessarily lost. I such
case i would like to keep it a least at 2.

I have set dfs.replication at 2 but this process time has not changed at all.
How could I change my configuration to avoid this hotspot issue you talked about.

As Kevin has advised I have also upped:
hbase.hstore.blockingStoreFiles to 100
hbase.hregion.memstore.block.multiplier to 7
hbase.hregion.memstore.flush.size to 256 MB
hbase.regionserver.optionallogflushinterval to 30s

However map importTsv is still around 1minutes for 1% of map tasks so over an
hour total.

Currently I have 42 running map tasks and an average of 28 tasks/node a lot of
my map tasks end up in "failed to report status for 601 seconds"

My cluster is 3 ubuntu machines:
 2 cores 4 threads 3.4+ GHz with 16gb ram 

With bulk load the process finishes in around 20 minutes. But I am suprised that
it takes more than an hour to insert 5 GB of data in hbase without bulkload I
feel there is something I'm not getting.

View raw message