hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: TableOutputFormat not efficient than direct HBase API calls?
Date Wed, 22 Jun 2011 02:31:55 GMT

TableOutputFormat also does this...

    table.setAutoFlush(false);

Check out the HBase book for how the writebuffer works with the HBase client.

http://hbase.apache.org/book.html#client


-----Original Message-----
From: edward choi [mailto:mp2893@gmail.com] 
Sent: Tuesday, June 21, 2011 10:23 PM
To: common-user@hadoop.apache.org; user@hbase.apache.org
Subject: TableOutputFormat not efficient than direct HBase API calls?

Hi,

I am writing an Hadoop application that uses HBase as both source and sink.

There is no reducer job in my application.

I am using TableOutputFormat as the OutputFormatClass.

I read it on the Internet that it is experimentally faster to directly instantiate HTable
and use HTable.batch() in the Map than to use TableOutputFormat as the Map's OutputClass

So I looked into the source code,
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.
It looked like TableRecordWriter does not support batch updates, since
TableRecordWriter.write() called HTable.put(new Put()).

Am I right on this matter? Or does TableOutputFormat automatically do batch updates somehow?
Or is there a specific way to do batch updates with TableOutputFormat?

Any explanation is greatly appreciated.

Ed

Mime
View raw message