hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sat, 07 Mar 2015 23:56:07 GMT
Nick, thanks. I will give REST a try. However, if it use the same design,
the result probably will be the same.

Michael, I was thinking about the same thing through JNI. Is there an
example I can follow?

Mike (Axiak), I run the C++ client on the same linux machine as the hbase
and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It doesn't
make a difference, does it?

Anyway, considering Thrift will get the scan result from HBase first, then
my c++ client the same data from Thrift. It definitely cost(probably)
double the time/cpu. So JNI may be the right way to go. Is there an example
I can use? thanks

Demai

On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net> wrote:

> What if you install the thrift server locally on every C++ client
> machine? I'd imagine performance should be similar to native java
> performance at that point.
>
> -Mike
>
> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <michael_segel@hotmail.com>
> wrote:
> > Or you could try a java connection wrapped by JNI so you can call it
> from your C++ app.
> >
> >> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
> >>
> >> You can try the REST gateway, though it has the same basic architecture
> as
> >> the thrift gateway. May be the details work out in your favor over rest.
> >>
> >> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com> wrote:
> >>
> >>> Stack,
> >>>
> >>> Thanks for the quick response. Well, the extra layer really kill the
> >>> Performance. The 'hop' is so expensive
> >>>
> >>> Is there another C/C++ api to try out?  I saw there is a jira
> Hbase-1015,
> >>> but was inactive for a while.
> >>>
> >>> Demai
> >>>
> >>> Stack <stack@duboce.net> wrote:
> >>>
> >>>> Is it because of the 'hop'?  Java goes against RS. The thrift C++
> goes to
> >>> a
> >>>> thriftserver which hosts a java client and then it goes to the RS?
> >>>> St.Ack
> >>>>
> >>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com> wrote:
> >>>>
> >>>>> hi, guys,
> >>>>>
> >>>>> I am trying to get a rough idea about the performance comparison
> between
> >>>>> c++ and java client when access HBase table, and is surprised to
find
> >>> out
> >>>>> that Thrift (c++) is 4X slower
> >>>>>
> >>>>> The performance result is:
> >>>>> C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
> >>>>> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
> >>>>>
> >>>>>
> >>>>> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and
> use
> >>> the
> >>>>> largest table : lineitem, which has 6M rows, roughly 600MB data.
> >>>>>
> >>>>> For c++ client, I used the thrift example provided by hbase-examples,
> >>> the
> >>>>> C++ code looks like:
> >>>>>
> >>>>>> std::string t("lineitem");
> >>>>>> int scanner =  client.scannerOpenWithScan(t, tscan,
> dummyAttributes);
> >>>>>> int count = 0;
> >>>>>> ..
> >>>>>> while (true) {
> >>>>>>   std::vector<TRowResult> value;
> >>>>>>   client.scannerGet(value, scanner);
> >>>>>>   if (value.size() == 0) break;
> >>>>>>   count ++;
> >>>>>> }
> >>>>>>
> >>>>>> std::cout << count << " rows scanned"<< std::endl;
> >>>>>>
> >>>>>
> >>>>> For java client is the most simple one:
> >>>>>
> >>>>>>    HTable table = new HTable(conf,"lineitem");
> >>>>>>
> >>>>>>    Scan scan = new Scan();
> >>>>>>    ResultScanner resScanner;
> >>>>>>    resScanner = table.getScanner(scan);
> >>>>>>    int count = 0;
> >>>>>>    for (Result res: resScanner) {
> >>>>>>      count ++;
> >>>>>>    }
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Since most of the time should be on I/O, I don't expect any
> significant
> >>>>> difference between Thrift(C++) and Java. Any ideas? Many thanks
> >>>>>
> >>>>> Demai
> >>>>>
> >>>
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message