hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "leiwangouc@gmail.com" <leiwang...@gmail.com>
Subject Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
Date Sat, 18 Jan 2014 04:34:45 GMT
Hi Lars,

public class AggregationCountForMultiFilter {

private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest");
private static final byte[] CF = Bytes.toBytes("cf");
private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST");

public static void main(String[] args) {

Configuration conf = new Configuration();
Configuration configuration = HBaseConfiguration.create(conf);
AggregationClient aggregationClient = new AggregationClient(configuration);

byte[] colA = Bytes.toBytes("tags");
byte[] colB = Bytes.toBytes("googleid");
byte[] colC = Bytes.toBytes("createtime");

List<Filter> filters = new ArrayList<Filter>();

SingleColumnValueFilter filter1 = new SingleColumnValueFilter(CF, colA, CompareOp.NOT_EQUAL,

SingleColumnValueFilter filter2 = new SingleColumnValueFilter(CF, colB, CompareOp.NOT_EQUAL,

SingleColumnValueFilter filter3 = new SingleColumnValueFilter(CF, colC, CompareOp.EQUAL, new

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

Scan scan = new Scan();

long rowCount = 0;
try {
rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
} catch (Throwable e) {
System.out.println("rowCount: " + rowCount);

The HBase version 0.94.6-cdh4.3.1 



From: lars hofhansl
Date: 2014-01-18 11:18
To: user@hbase.apache.org
Subject: Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
Offhand there is no reason for that.
If you send some sample code that can seed the data and then run the filter that shows the
problem, I'll offer to do some profiling.

Which version of HBase are you using?

-- Lars 

From: "leiwangouc@gmail.com" <leiwangouc@gmail.com>
To: user <user@hbase.apache.org> 
Cc: user <user@hbase.apache.org> 
Sent: Friday, January 17, 2014 5:24 PM
Subject: Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor


I have tried.  
For a talbe with about 600 million rowkey,  just pass a single QualifierFilter,  it can get
the result quickly. 
But when i add the SingleColumnValueFilter with FilterList, it becoumes very slow and i can't
stand it. 

I think i can write my own custumed aggregation client.  Is there any example or user guide
about how to write custumed aggregation client using coprocessor?



From: Ted Yu
Date: 2014-01-17 18:03
To: user@hbase.apache.org
CC: user
Subject: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
Take a look at http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html#rowCount(byte[],%20org.apache.hadoop.hbase.coprocessor.ColumnInterpreter,%20org.apache.hadoop.hbase.client.Scan)

You can pass custom filter through Scan parameter. 


On Jan 16, 2014, at 11:58 PM, "leiwangouc@gmail.com" <leiwangouc@gmail.com> wrote:

> Hi,
> I know that hbase copocessor provides a quick way to count the rows of a table.
> But how can i count the rows that meet several conditions.
> Take this for example. 
> I have a hbase table with one column family, several columns. I want to caculate the
number of rows that meet 3 conditions:
> has column1
> has column2
> has column3  and the value of column3 satisfy a regular expression
> Thans,
> Lei
> leiwangouc@gmail.com
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message