Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C307610F3B for ; Sat, 7 Mar 2015 21:49:34 +0000 (UTC) Received: (qmail 97333 invoked by uid 500); 7 Mar 2015 21:49:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 97258 invoked by uid 500); 7 Mar 2015 21:49:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 97247 invoked by uid 99); 7 Mar 2015 21:49:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 21:49:31 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.103 as permitted sender) Received: from [65.55.111.103] (HELO BLU004-OMC2S28.hotmail.com) (65.55.111.103) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 21:49:05 +0000 Received: from BLU436-SMTP69 ([65.55.111.73]) by BLU004-OMC2S28.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.22751); Sat, 7 Mar 2015 13:49:03 -0800 X-TMN: [MW5wWDcY+HNs4rH327g8kpAzJ+YENFCn] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) Subject: Re: significant scan performance difference between Thrift(c++) and Java: 4X slower From: Michael Segel In-Reply-To: Date: Sat, 7 Mar 2015 15:49:01 -0600 Content-Transfer-Encoding: quoted-printable References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.2070.6) X-OriginalArrivalTime: 07 Mar 2015 21:49:02.0251 (UTC) FILETIME=[8667A7B0:01D05920] X-Virus-Checked: Checked by ClamAV on apache.org Or you could try a java connection wrapped by JNI so you can call it = from your C++ app.=20 > On Mar 7, 2015, at 1:00 PM, Nick Dimiduk wrote: >=20 > You can try the REST gateway, though it has the same basic = architecture as > the thrift gateway. May be the details work out in your favor over = rest. >=20 > On Fri, Mar 6, 2015 at 11:31 PM, nidmgg wrote: >=20 >> Stack, >>=20 >> Thanks for the quick response. Well, the extra layer really kill the >> Performance. The 'hop' is so expensive >>=20 >> Is there another C/C++ api to try out? I saw there is a jira = Hbase-1015, >> but was inactive for a while. >>=20 >> Demai >>=20 >> Stack wrote: >>=20 >>> Is it because of the 'hop'? Java goes against RS. The thrift C++ = goes to >> a >>> thriftserver which hosts a java client and then it goes to the RS? >>> St.Ack >>>=20 >>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni wrote: >>>=20 >>>> hi, guys, >>>>=20 >>>> I am trying to get a rough idea about the performance comparison = between >>>> c++ and java client when access HBase table, and is surprised to = find >> out >>>> that Thrift (c++) is 4X slower >>>>=20 >>>> The performance result is: >>>> C++: real *16m11.313s*; user 5m3.642s; sys 2m21.388s >>>> Java: real *4m6.012s*;user 0m31.228s; sys 0m8.018s >>>>=20 >>>>=20 >>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and = use >> the >>>> largest table : lineitem, which has 6M rows, roughly 600MB data. >>>>=20 >>>> For c++ client, I used the thrift example provided by = hbase-examples, >> the >>>> C++ code looks like: >>>>=20 >>>>> std::string t("lineitem"); >>>>> int scanner =3D client.scannerOpenWithScan(t, tscan, = dummyAttributes); >>>>> int count =3D 0; >>>>> .. >>>>> while (true) { >>>>> std::vector value; >>>>> client.scannerGet(value, scanner); >>>>> if (value.size() =3D=3D 0) break; >>>>> count ++; >>>>> } >>>>>=20 >>>>> std::cout << count << " rows scanned"<< std::endl; >>>>>=20 >>>>=20 >>>> For java client is the most simple one: >>>>=20 >>>>> HTable table =3D new HTable(conf,"lineitem"); >>>>>=20 >>>>> Scan scan =3D new Scan(); >>>>> ResultScanner resScanner; >>>>> resScanner =3D table.getScanner(scan); >>>>> int count =3D 0; >>>>> for (Result res: resScanner) { >>>>> count ++; >>>>> } >>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> Since most of the time should be on I/O, I don't expect any = significant >>>> difference between Thrift(C++) and Java. Any ideas? Many thanks >>>>=20 >>>> Demai >>>>=20 >>=20 The opinions expressed here are mine, while they may reflect a cognitive = thought, that is purely accidental.=20 Use at your own risk.=20 Michael Segel michael_segel (AT) hotmail.com