hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tony Dean <Tony.D...@sas.com>
Subject Scan performance
Date Fri, 21 Jun 2013 21:08:32 GMT
Hi,

I hope that you can shed some light on these 2 scenarios below.

I have 2 small tables of 6000 rows.
Table 1 has only 1 column in each of its rows.
Table 2 has 40 columns in each of its rows.
Other than that the two tables are identical.

In both tables there is only 1 row that contains a matching column that I am filtering on.
  And the Scan performs correctly in both cases by returning only the single result.

The code looks something like the following:

Scan scan = new Scan(startRow, stopRow);   // the start/stop rows should include all 6000
rows
scan.addColumn(cf, qualifier); // only return the column that I am interested in (should only
be in 1 row and only 1 version)

Filter f1 = new InclusiveStopFilter(stopRow);
Filter f2 = new SingleColumnValueFilter(cf, qualifier,  CompareFilter.CompareOp.EQUALS, value);
scan.setFilter(new FilterList(f1, f2));

scan .setTimeRange(0, MAX_LONG);
scan.setMaxVersions(1);

ResultScanner rs = t.getScanner(scan);
for (Result result: rs)
{
  ...
}

For table 1, rs.next() takes about 30ms.
For table 2, rs.next() takes about 180ms.

Both are returning the exact same result.  Why is it taking so much longer on table 2 to get
the same result?  The scan depth is the same.  The only difference is the column width.  But
I'm filtering on a single column and returning only that column.

Am I missing something?  As I increase the number of columns, the response time gets worse.
 I do expect the response time to get worse when increasing the number of rows, but not by
increasing the number of columns since I'm returning only 1 column in both cases.

I appreciate any comments that you have.

-Tony



Tony Dean
SAS Institute Inc.
Principal Software Developer
919-531-6704






Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message