hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase performace & bulk load
Date Thu, 22 Jul 2010 20:43:51 GMT

This is bad, you must be doing something slow like creating a new
HTable for each put call. Also you need to use the write buffer
(disable auto flushing, then set the write buffer size on HTable
during the map configuration) if since you manage the HTable yourself.

The bulk load tool usage is wide-spread, you should give it a try if
you only have 1 family.


On Thu, Jul 22, 2010 at 1:06 PM, HAN LIU <hanl1@andrew.cmu.edu> wrote:
> Hi Guys,
> I've been doing some data insertion from HDFS to HBase and the performance seems to be
really bad. It took about 3 hours to insert 15 GB of data.  The mapreduce job is launched
from one machine which grabs data from HDFS and insert them into an HTable located at 3 other
machines (1 master and 2 regionservers). There are 17 map job in total (no reduce jobs), representing
17 files each about 1GB in size. The mapper simply extracts the useful information from each
of these files and insert them into HBase. In the end there are about 22 million rows added
in the table, and with my implementation (pretty low-efficient I think), for each of these
row a 'table.put(Put p)' method is called once, so in the end there are 22 million 'table.put()'
> Does it make sense that these many 'table.put' calls talks 3 hours? Because I have played
with my code and I have determined that the bottleneck is these 'table.put()' calls, because
if I remove them, the rest of the code (doing every part of the job except for committing
the updates via 'table.put()' )only takes 2 minutes to run. I am really inexperienced in HBase,
so how do you guys usually do data insertion? What could be the tricks to enhance performance?
> I am thinking about using the bulk load feature to batch insert data into HBase. Is this
a popular method out there in the HBase community?
> Really sorry about asking so much help for my problems but not helping other people with
theirs. I really would like to offer help once I get more experienced with HBase.
> Thanks a lot in advance :)
> ----
> Han Liu
> SCS & HCI Institute
> Undergrad. Class of 2012
> Carnegie Mellon University

View raw message