hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Larry Compton <lawrence.comp...@gmail.com>
Subject Re: Performance of hbase importing
Date Thu, 15 Jan 2009 19:03:15 GMT
I'm interested in trying this, but I'm not seeing "setAutoFlush()" and
"setWriteBufferSize()" in the "HTable" API (I'm using HBase 0.18.1).

Larry

On Sun, Jan 11, 2009 at 5:11 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hi all,
>
> New user of hbase here. I've been trolling about in IRC for a few days, and
> been getting great help all around so far.
>
> The topic turns to importing data into hbase - I have largeish datasets I
> want to evaluate hbase performance on, so I've been working at importing
> said data.  I've managed to get some impressive performance speedups, and I
> chronicled them here:
>
>
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
>
> To summarize:
> - Use the Native HBASE API in Java or Jython (or presumably any JVM
> language)
> - Disable table auto flush, set write buffer large (12M for me)
>
> At this point I can import a 18 GB, 440m row comma-seperated flat file in
> about 72 minutes using map-reduce.  This is on a 3 node cluster all running
> hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner DB
> hardware, so once I get my real cluster I'll revise/publish new data.
>
> I look forward to meeting some of you next week at the hbase meetup at
> powerset!
>
> -ryan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message