Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A597061BB for ; Wed, 22 Jun 2011 02:31:20 +0000 (UTC) Received: (qmail 64882 invoked by uid 500); 22 Jun 2011 02:31:19 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 64856 invoked by uid 500); 22 Jun 2011 02:31:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 64848 invoked by uid 99); 22 Jun 2011 02:31:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 02:31:19 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.18.222.49] (HELO smtp3.4emm.com) (69.18.222.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jun 2011 02:31:12 +0000 Received: from EX2K7VS03.4emm.local ([192.168.160.203]) by HUB03.4emm.local ([192.168.161.134]) with mapi; Tue, 21 Jun 2011 22:30:50 -0400 From: Doug Meil To: "user@hbase.apache.org" Date: Tue, 21 Jun 2011 22:31:55 -0400 Subject: RE: TableOutputFormat not efficient than direct HBase API calls? Thread-Topic: TableOutputFormat not efficient than direct HBase API calls? Thread-Index: Acwwgv/VRteHKZWgQvSmTvXiwR50UwAATp4Q Message-ID: <67680900F79B1D4F99C844EE386FC5952823CD86C9@EX2K7VS03.4emm.local> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 TableOutputFormat also does this... table.setAutoFlush(false); Check out the HBase book for how the writebuffer works with the HBase clien= t. http://hbase.apache.org/book.html#client -----Original Message----- From: edward choi [mailto:mp2893@gmail.com]=20 Sent: Tuesday, June 21, 2011 10:23 PM To: common-user@hadoop.apache.org; user@hbase.apache.org Subject: TableOutputFormat not efficient than direct HBase API calls? Hi, I am writing an Hadoop application that uses HBase as both source and sink. There is no reducer job in my application. I am using TableOutputFormat as the OutputFormatClass. I read it on the Internet that it is experimentally faster to directly inst= antiate HTable and use HTable.batch() in the Map than to use TableOutputFor= mat as the Map's OutputClass So I looked into the source code, org.apache.hadoop.hbase.mapreduce.TableOutputFormat. It looked like TableRecordWriter does not support batch updates, since TableRecordWriter.write() called HTable.put(new Put()). Am I right on this matter? Or does TableOutputFormat automatically do batch= updates somehow? Or is there a specific way to do batch updates with TableOutputFormat? Any explanation is greatly appreciated. Ed