hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How to quickly count the rows that meet several conditions using hbase coprocessor
Date Sat, 18 Jan 2014 06:28:15 GMT
Can you use other string for fake value ?
DOESNOTEXIST is a bit long. Shouldn't be difficult to come up with a single character string
that doesn't appear in the first two columns. 

Cheers

On Jan 17, 2014,  at 8:34 PM, "leiwangouc@gmail.com" <leiwangouc@gmail.com> wrote:

> Hi Lars,
> 
> public class AggregationCountForMultiFilter {
> 
> private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest");
> private static final byte[] CF = Bytes.toBytes("cf");
> private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST");
> 
> public static void main(String[] args) {
> 
> Configuration conf = new Configuration();
> Configuration configuration = HBaseConfiguration.create(conf);
> AggregationClient aggregationClient = new AggregationClient(configuration);
> 
> byte[] colA = Bytes.toBytes("tags");
> byte[] colB = Bytes.toBytes("googleid");
> byte[] colC = Bytes.toBytes("createtime");
> 
> List<Filter> filters = new ArrayList<Filter>();
> 
> SingleColumnValueFilter filter1 = new SingleColumnValueFilter(CF, colA, CompareOp.NOT_EQUAL,
FAKE_VLAUE);
> filter1.setFilterIfMissing(true);
> filters.add(filter1);
> 
> SingleColumnValueFilter filter2 = new SingleColumnValueFilter(CF, colB, CompareOp.NOT_EQUAL,
FAKE_VLAUE);
> filter2.setFilterIfMissing(true);
> filters.add(filter2);
> 
> SingleColumnValueFilter filter3 = new SingleColumnValueFilter(CF, colC, CompareOp.EQUAL,
new RegexStringComparator("^2014-01-15"));
> filter3.setFilterIfMissing(true);
> filters.add(filter3);
> 
> FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);
> 
> Scan scan = new Scan();
> scan.addFamily(CF);
> scan.setFilter(filterList);
> 
> long rowCount = 0;
> try {
> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
> } catch (Throwable e) {
> e.printStackTrace();
> }
> System.out.println("rowCount: " + rowCount);
> }
> }
> }
> 
> The HBase version 0.94.6-cdh4.3.1 
> 
> Thanks,
> Lei
> 
> 
> 
> leiwangouc@gmail.com
> 
> From: lars hofhansl
> Date: 2014-01-18 11:18
> To: user@hbase.apache.org
> Subject: Re: Re: How to quickly count the rows that meet several conditions using hbase
coprocessor
> Offhand there is no reason for that.
> If you send some sample code that can seed the data and then run the filter that shows
the problem, I'll offer to do some profiling.
> 
> Which version of HBase are you using?
> 
> -- Lars 
> 
> 
> ________________________________
> From: "leiwangouc@gmail.com" <leiwangouc@gmail.com>
> To: user <user@hbase.apache.org> 
> Cc: user <user@hbase.apache.org> 
> Sent: Friday, January 17, 2014 5:24 PM
> Subject: Re: Re: How to quickly count the rows that meet several conditions using hbase
coprocessor
> 
> Hi, 
> 
> I have tried.  
> For a talbe with about 600 million rowkey,  just pass a single QualifierFilter,  it can
get the result quickly. 
> But when i add the SingleColumnValueFilter with FilterList, it becoumes very slow and
i can't stand it. 
> 
> I think i can write my own custumed aggregation client.  Is there any example or user
guide about how to write custumed aggregation client using coprocessor?
> 
> Thanks,
> Lei
> 
> 
> 
> 
> leiwangouc@gmail.com
> 
> From: Ted Yu
> Date: 2014-01-17 18:03
> To: user@hbase.apache.org
> CC: user
> Subject: Re: How to quickly count the rows that meet several conditions using hbase coprocessor
> Take a look at http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html#rowCount(byte[],%20org.apache.hadoop.hbase.coprocessor.ColumnInterpreter,%20org.apache.hadoop.hbase.client.Scan)
> 
> You can pass custom filter through Scan parameter. 
> 
> Cheers
> 
> On Jan 16, 2014, at 11:58 PM, "leiwangouc@gmail.com" <leiwangouc@gmail.com> wrote:
> 
>> Hi,
>> 
>> I know that hbase copocessor provides a quick way to count the rows of a table.
>> But how can i count the rows that meet several conditions.
>> 
>> Take this for example. 
>> I have a hbase table with one column family, several columns. I want to caculate
the number of rows that meet 3 conditions:
>> has column1
>> has column2
>> has column3  and the value of column3 satisfy a regular expression
>> 
>> Thans,
>> Lei
>> 
>> 
>> 
>> leiwangouc@gmail.com

Mime
View raw message