hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Rawson" <ryano...@gmail.com>
Subject Performance of hbase importing
Date Sun, 11 Jan 2009 22:11:46 GMT
Hi all,

New user of hbase here. I've been trolling about in IRC for a few days, and
been getting great help all around so far.

The topic turns to importing data into hbase - I have largeish datasets I
want to evaluate hbase performance on, so I've been working at importing
said data.  I've managed to get some impressive performance speedups, and I
chronicled them here:

http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html

To summarize:
- Use the Native HBASE API in Java or Jython (or presumably any JVM
language)
- Disable table auto flush, set write buffer large (12M for me)

At this point I can import a 18 GB, 440m row comma-seperated flat file in
about 72 minutes using map-reduce.  This is on a 3 node cluster all running
hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner DB
hardware, so once I get my real cluster I'll revise/publish new data.

I look forward to meeting some of you next week at the hbase meetup at
powerset!

-ryan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message