hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick maillard <nicolas.maill...@fifty-five.com>
Subject Hbase import Tsv performance (slow import)
Date Tue, 23 Oct 2012 15:48:25 GMT
Hi everyone

I'm starting with hbase and testing for our needs. I have set up a hadoop
cluster of Three machines and A Hbase cluster atop on the same three machines,
one master two slaves.

I am testing the Import of a 5GB csv file with the importTsv tool. I import the
file in the HDFS and use the importTsv tool to import in Hbase.

Right now it takes a little over an hour to complete. It creates around 2
million entries in one table with a single family.
If I use bulk uploading it goes down to 20 minutes.

My hadoop has 21 map tasks but they all seem to be taking a very long time to
finish many tasks end up in time out.

I am wondering what I have missed in my configuration. I have followed the
different prerequisites in the documentations but I am really unsure as to what
is causing this slow down. If I were to apply the wordcount example to the same
file it takes only minutes to complete so I am guessing the issue lies in my
Hbase configuration.

Any help or pointers would by appreciated 

View raw message