hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sat, 07 Mar 2015 21:49:01 GMT
Or you could try a java connection wrapped by JNI so you can call it from your C++ app. 

> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> 
> You can try the REST gateway, though it has the same basic architecture as
> the thrift gateway. May be the details work out in your favor over rest.
> 
> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com> wrote:
> 
>> Stack,
>> 
>> Thanks for the quick response. Well, the extra layer really kill the
>> Performance. The 'hop' is so expensive
>> 
>> Is there another C/C++ api to try out?  I saw there is a jira Hbase-1015,
>> but was inactive for a while.
>> 
>> Demai
>> 
>> Stack <stack@duboce.net> wrote:
>> 
>>> Is it because of the 'hop'?  Java goes against RS. The thrift C++ goes to
>> a
>>> thriftserver which hosts a java client and then it goes to the RS?
>>> St.Ack
>>> 
>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com> wrote:
>>> 
>>>> hi, guys,
>>>> 
>>>> I am trying to get a rough idea about the performance comparison between
>>>> c++ and java client when access HBase table, and is surprised to find
>> out
>>>> that Thrift (c++) is 4X slower
>>>> 
>>>> The performance result is:
>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
>>>> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
>>>> 
>>>> 
>>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use
>> the
>>>> largest table : lineitem, which has 6M rows, roughly 600MB data.
>>>> 
>>>> For c++ client, I used the thrift example provided by hbase-examples,
>> the
>>>> C++ code looks like:
>>>> 
>>>>> std::string t("lineitem");
>>>>> int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
>>>>> int count = 0;
>>>>> ..
>>>>> while (true) {
>>>>>   std::vector<TRowResult> value;
>>>>>   client.scannerGet(value, scanner);
>>>>>   if (value.size() == 0) break;
>>>>>   count ++;
>>>>> }
>>>>> 
>>>>> std::cout << count << " rows scanned"<< std::endl;
>>>>> 
>>>> 
>>>> For java client is the most simple one:
>>>> 
>>>>>    HTable table = new HTable(conf,"lineitem");
>>>>> 
>>>>>    Scan scan = new Scan();
>>>>>    ResultScanner resScanner;
>>>>>    resScanner = table.getScanner(scan);
>>>>>    int count = 0;
>>>>>    for (Result res: resScanner) {
>>>>>      count ++;
>>>>>    }
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> Since most of the time should be on I/O, I don't expect any significant
>>>> difference between Thrift(C++) and Java. Any ideas? Many thanks
>>>> 
>>>> Demai
>>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message