hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Mon, 09 Mar 2015 22:41:26 GMT
Sorry Demai, I have no access to that code currently.

But what you described seems that you use
thrift v1. I'd recommend to use thrift2.

Also it is a good idea to check thrift server configuration:
1. blocking/nonblocking/hsha, and framed or not
2. size of thread pool



On Mon, Mar 9, 2015 at 9:26 PM, Demai Ni <nidmgg@gmail.com> wrote:

> Andrey and all,
>
> thanks for the input. Andrey, if possible, do you mind share your code
> segment so I can follow the setting on your side?
>
> I have exactly the same thought when face the result first time. I was
> expecting a little bit performance issue (10~20%) when using Thrift(C++),
> and not as much.
>
> Now I am looking into the C++ api call. Original, I used
> "client.scannerGet(value, scanner)" ,which will do a lot of prepare
> work(like flush) for each call. I just changed the code to use
> "client.scannerGetList(value,scanner, 10000);". Sure enough, the
> performance improved. However, for a similiar comparison, I did set java
> client to 10000 batch/cache. Here is the new code:
>
> > *C++*
> >     TScan tscan;
> >     int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
> >     int count = 0;
> >     try {
> >       while (true) {
> >         std::vector<TRowResult> value;
> >
> >         client.scannerGetList(value,scanner, *10000*);
> >         if (value.size() == 0) {
> >          break;
> >         } else count+=value.size();
> >       }
> >
>
> *Java *
>     int total = 0;
>
>         scan  = new Scan();
>
> *        scan.setCaching(10000);        scan.setBatch(10000);*
>         resScanner = table.getScanner(scan);
>         int count = 0;
>         for (Result res: resScanner) {
>             count ++;
>         }
>
> so both client code improved as expected, and the Thrift C++ still take 3X
> time comparing to Java:
> C++ : real    6m46.845s, user    1m59.636s, sys    0m11.984s
> Java: real    2m27.245s, user    0m17.624s, sys    0m4.779s
>
> To be fair, I am able to setCaching on Java Client, but didn't find a way
> to do the same through the C++ API, which also make some difference
>
> Demai
>
>
> On Sun, Mar 8, 2015 at 1:40 PM, Andrey Stepachev <octo47@gmail.com> wrote:
>
> > Hi Demai.
> >
> > Thats seems odd for me, in my tests I got very similar performance.
> > I'd like to suggest to check that scans have identical parameters
> > (cache size in particular). That can bring very different performance
> > in you case.
> >
> > Thanks.
> >
> > On Sun, Mar 8, 2015 at 6:50 PM, Mike Axiak <mike@axiak.net> wrote:
> >
> > > If you're going the JNI route, the best bet is to embed a VM in your C
> > > project. You use "java -s -p" to create the required header files and
> > > compile linking against the java library.  This article talks about
> > > how to talk from C to Java:
> > >
> > >
> >
> http://www.codeproject.com/Articles/22881/How-to-Call-Java-Functions-from-C-Using-JNI
> > >
> > > Best,
> > > Mike
> > >
> > > On Sun, Mar 8, 2015 at 10:29 AM, Michael Segel
> > > <michael_segel@hotmail.com> wrote:
> > > > JNI example?
> > > >
> > > > I don’t have one… my client’s own the code so I can’t take it
with me
> > > and share.
> > > > (The joys of being a consultant means you can’t take it with you and
> > you
> > > need to make sure you don’t xfer IP accidentally. )
> > > >
> > > >
> > > > Maybe in one of the HBase books? Or just google for a JNI example on
> > the
> > > web since its straight forward Java code to connect to HBase and then
> > > straight JNI t talk to C/C++
> > > >
> > > >
> > > >> On Mar 7, 2015, at 5:56 PM, Demai Ni <nidmgg@gmail.com> wrote:
> > > >>
> > > >> Nick, thanks. I will give REST a try. However, if it use the same
> > > design,
> > > >> the result probably will be the same.
> > > >>
> > > >> Michael, I was thinking about the same thing through JNI. Is there
> an
> > > >> example I can follow?
> > > >>
> > > >> Mike (Axiak), I run the C++ client on the same linux machine as the
> > > hbase
> > > >> and thrift. The HBase uses ip 127.0.0.1 and thrift uses 0.0.0.0. It
> > > doesn't
> > > >> make a difference, does it?
> > > >>
> > > >> Anyway, considering Thrift will get the scan result from HBase
> first,
> > > then
> > > >> my c++ client the same data from Thrift. It definitely
> cost(probably)
> > > >> double the time/cpu. So JNI may be the right way to go. Is there an
> > > example
> > > >> I can use? thanks
> > > >>
> > > >> Demai
> > > >>
> > > >> On Sat, Mar 7, 2015 at 1:54 PM, Mike Axiak <mike@axiak.net>
wrote:
> > > >>
> > > >>> What if you install the thrift server locally on every C++ client
> > > >>> machine? I'd imagine performance should be similar to native java
> > > >>> performance at that point.
> > > >>>
> > > >>> -Mike
> > > >>>
> > > >>> On Sat, Mar 7, 2015 at 4:49 PM, Michael Segel <
> > > michael_segel@hotmail.com>
> > > >>> wrote:
> > > >>>> Or you could try a java connection wrapped by JNI so you can
call
> it
> > > >>> from your C++ app.
> > > >>>>
> > > >>>>> On Mar 7, 2015, at 1:00 PM, Nick Dimiduk <ndimiduk@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>> You can try the REST gateway, though it has the same basic
> > > architecture
> > > >>> as
> > > >>>>> the thrift gateway. May be the details work out in your
favor
> over
> > > rest.
> > > >>>>>
> > > >>>>> On Fri, Mar 6, 2015 at 11:31 PM, nidmgg <nidmgg@gmail.com>
> wrote:
> > > >>>>>
> > > >>>>>> Stack,
> > > >>>>>>
> > > >>>>>> Thanks for the quick response. Well, the extra layer
really kill
> > the
> > > >>>>>> Performance. The 'hop' is so expensive
> > > >>>>>>
> > > >>>>>> Is there another C/C++ api to try out?  I saw there
is a jira
> > > >>> Hbase-1015,
> > > >>>>>> but was inactive for a while.
> > > >>>>>>
> > > >>>>>> Demai
> > > >>>>>>
> > > >>>>>> Stack <stack@duboce.net> wrote:
> > > >>>>>>
> > > >>>>>>> Is it because of the 'hop'?  Java goes against
RS. The thrift
> C++
> > > >>> goes to
> > > >>>>>> a
> > > >>>>>>> thriftserver which hosts a java client and then
it goes to the
> > RS?
> > > >>>>>>> St.Ack
> > > >>>>>>>
> > > >>>>>>> On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com>
> > wrote:
> > > >>>>>>>
> > > >>>>>>>> hi, guys,
> > > >>>>>>>>
> > > >>>>>>>> I am trying to get a rough idea about the
performance
> comparison
> > > >>> between
> > > >>>>>>>> c++ and java client when access HBase table,
and is surprised
> to
> > > find
> > > >>>>>> out
> > > >>>>>>>> that Thrift (c++) is 4X slower
> > > >>>>>>>>
> > > >>>>>>>> The performance result is:
> > > >>>>>>>> C++:  real    *16m11.313s*; user    5m3.642s;
sys    2m21.388s
> > > >>>>>>>> Java: real    *4m6.012s*;user    0m31.228s;
sys    0m8.018s
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> I have a single node HBase(98.6) cluster,
with 1X TPCH loaded,
> > and
> > > >>> use
> > > >>>>>> the
> > > >>>>>>>> largest table : lineitem, which has 6M rows,
roughly 600MB
> data.
> > > >>>>>>>>
> > > >>>>>>>> For c++ client, I used the thrift example
provided by
> > > hbase-examples,
> > > >>>>>> the
> > > >>>>>>>> C++ code looks like:
> > > >>>>>>>>
> > > >>>>>>>>> std::string t("lineitem");
> > > >>>>>>>>> int scanner =  client.scannerOpenWithScan(t,
tscan,
> > > >>> dummyAttributes);
> > > >>>>>>>>> int count = 0;
> > > >>>>>>>>> ..
> > > >>>>>>>>> while (true) {
> > > >>>>>>>>>  std::vector<TRowResult> value;
> > > >>>>>>>>>  client.scannerGet(value, scanner);
> > > >>>>>>>>>  if (value.size() == 0) break;
> > > >>>>>>>>>  count ++;
> > > >>>>>>>>> }
> > > >>>>>>>>>
> > > >>>>>>>>> std::cout << count << " rows
scanned"<< std::endl;
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> For java client is the most simple one:
> > > >>>>>>>>
> > > >>>>>>>>>   HTable table = new HTable(conf,"lineitem");
> > > >>>>>>>>>
> > > >>>>>>>>>   Scan scan = new Scan();
> > > >>>>>>>>>   ResultScanner resScanner;
> > > >>>>>>>>>   resScanner = table.getScanner(scan);
> > > >>>>>>>>>   int count = 0;
> > > >>>>>>>>>   for (Result res: resScanner) {
> > > >>>>>>>>>     count ++;
> > > >>>>>>>>>   }
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Since most of the time should be on I/O, I
don't expect any
> > > >>> significant
> > > >>>>>>>> difference between Thrift(C++) and Java. Any
ideas? Many
> thanks
> > > >>>>>>>>
> > > >>>>>>>> Demai
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>> The opinions expressed here are mine, while they may reflect
a
> > > cognitive
> > > >>> thought, that is purely accidental.
> > > >>>> Use at your own risk.
> > > >>>> Michael Segel
> > > >>>> michael_segel (AT) hotmail.com
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >
> > > > The opinions expressed here are mine, while they may reflect a
> > cognitive
> > > thought, that is purely accidental.
> > > > Use at your own risk.
> > > > Michael Segel
> > > > michael_segel (AT) hotmail.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Andrey.
> >
>



-- 
Andrey.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message