hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sun, 08 Mar 2015 14:29:16 GMT
JNI example? 

I don’t have one… my client’s own the code so I can’t take it with me and share. 
(The joys of being a consultant means you can’t take it with you and you need to make sure
you don’t xfer IP accidentally. ) 


Maybe in one of the HBase books? Or just google for a JNI example on the web since its straight
forward Java code to connect to HBase and then straight JNI t talk to C/C++


> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com> wrote:
> 
> Nick, thanks. I will give REST a try. However, if it use the same design,
> the result probably will be the same.
> 
> Michael, I was thinking about the same thing through JNI. Is there an
> example I can follow?
> 
> Mike (Axiak), I run the C++ client on the same linux machine as the hbase
> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It doesn't
> make a difference, does it?
> 
> Anyway, considering Thrift will get the scan result from HBase first, then
> my c++ client the same data from Thrift. It definitely cost(probably)
> double the time/cpu. So JNI may be the right way to go. Is there an example
> I can use? thanks
> 
> Demai
> 
> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net> wrote:
> 
>> What if you install the thrift server locally on every C++ client
>> machine? I'd imagine performance should be similar to native java
>> performance at that point.
>> 
>> -Mike
>> 
>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <michael_segel@hotmail.com>
>> wrote:
>>> Or you could try a java connection wrapped by JNI so you can call it
>> from your C++ app.
>>> 
>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>>>> 
>>>> You can try the REST gateway, though it has the same basic architecture
>> as
>>>> the thrift gateway. May be the details work out in your favor over rest.
>>>> 
>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com> wrote:
>>>> 
>>>>> Stack,
>>>>> 
>>>>> Thanks for the quick response. Well, the extra layer really kill the
>>>>> Performance. The 'hop' is so expensive
>>>>> 
>>>>> Is there another C/C++ api to try out?  I saw there is a jira
>> Hbase-1015,
>>>>> but was inactive for a while.
>>>>> 
>>>>> Demai
>>>>> 
>>>>> Stack <stack@duboce.net> wrote:
>>>>> 
>>>>>> Is it because of the 'hop'?  Java goes against RS. The thrift C++
>> goes to
>>>>> a
>>>>>> thriftserver which hosts a java client and then it goes to the RS?
>>>>>> St.Ack
>>>>>> 
>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com>
wrote:
>>>>>> 
>>>>>>> hi, guys,
>>>>>>> 
>>>>>>> I am trying to get a rough idea about the performance comparison
>> between
>>>>>>> c++ and java client when access HBase table, and is surprised
to find
>>>>> out
>>>>>>> that Thrift (c++) is 4X slower
>>>>>>> 
>>>>>>> The performance result is:
>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
>>>>>>> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
>>>>>>> 
>>>>>>> 
>>>>>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded,
and
>> use
>>>>> the
>>>>>>> largest table : lineitem, which has 6M rows, roughly 600MB data.
>>>>>>> 
>>>>>>> For c++ client, I used the thrift example provided by hbase-examples,
>>>>> the
>>>>>>> C++ code looks like:
>>>>>>> 
>>>>>>>> std::string t("lineitem");
>>>>>>>> int scanner =  client.scannerOpenWithScan(t, tscan,
>> dummyAttributes);
>>>>>>>> int count = 0;
>>>>>>>> ..
>>>>>>>> while (true) {
>>>>>>>>  std::vector<TRowResult> value;
>>>>>>>>  client.scannerGet(value, scanner);
>>>>>>>>  if (value.size() == 0) break;
>>>>>>>>  count ++;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> std::cout << count << " rows scanned"<<
std::endl;
>>>>>>>> 
>>>>>>> 
>>>>>>> For java client is the most simple one:
>>>>>>> 
>>>>>>>>   HTable table = new HTable(conf,"lineitem");
>>>>>>>> 
>>>>>>>>   Scan scan = new Scan();
>>>>>>>>   ResultScanner resScanner;
>>>>>>>>   resScanner = table.getScanner(scan);
>>>>>>>>   int count = 0;
>>>>>>>>   for (Result res: resScanner) {
>>>>>>>>     count ++;
>>>>>>>>   }
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Since most of the time should be on I/O, I don't expect any
>> significant
>>>>>>> difference between Thrift(C++) and Java. Any ideas? Many thanks
>>>>>>> 
>>>>>>> Demai
>>>>>>> 
>>>>> 
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is
purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Mime
View raw message