hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: significant scan performance difference between Thrift(c++) and Java: 4X slower
Date Sat, 07 Mar 2015 04:31:24 GMT
Is it because of the 'hop'?  Java goes against RS. The thrift C++ goes to a
thriftserver which hosts a java client and then it goes to the RS?
St.Ack

On Fri, Mar 6, 2015 at 4:46 PM, Demai Ni <nidmgg@gmail.com> wrote:

> hi, guys,
>
> I am trying to get a rough idea about the performance comparison between
> c++ and java client when access HBase table, and is surprised to find out
> that Thrift (c++) is 4X slower
>
> The performance result is:
> C++:  real    *16m11.313s*; user    5m3.642s; sys    2m21.388s
> Java: real    *4m6.012s*;user    0m31.228s; sys    0m8.018s
>
>
> I have a single node HBase(98.6) cluster, with 1X TPCH loaded, and use the
> largest table : lineitem, which has 6M rows, roughly 600MB data.
>
> For c++ client, I used the thrift example provided by hbase-examples, the
> C++ code looks like:
>
> >  std::string t("lineitem");
> >  int scanner =  client.scannerOpenWithScan(t, tscan, dummyAttributes);
> >  int count = 0;
> > ..
> >  while (true) {
> >    std::vector<TRowResult> value;
> >    client.scannerGet(value, scanner);
> >    if (value.size() == 0) break;
> >    count ++;
> >  }
> >
> >  std::cout << count << " rows scanned"<< std::endl;
> >
>
> For java client is the most simple one:
>
> >     HTable table = new HTable(conf,"lineitem");
> >
> >     Scan scan = new Scan();
> >     ResultScanner resScanner;
> >     resScanner = table.getScanner(scan);
> >     int count = 0;
> >     for (Result res: resScanner) {
> >       count ++;
> >     }
> >
>
>
>
> Since most of the time should be on I/O, I don't expect any significant
> difference between Thrift(C++) and Java. Any ideas? Many thanks
>
> Demai
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message