hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yuzhih...@gmail.com
Subject Re: Hbase Count Aggregate Function
Date Tue, 25 Dec 2012 16:57:03 GMT
RowCount method accepts scan object where you can attach your custom filter. 

Cheers



On Dec 25, 2012, at 8:42 AM, Dalia Sobhy <dalia.mohsobhy@hotmail.com> wrote:

> 
> Do you mean I implement a new rowCount method in Aggregation Client Class.
> 
> I cannot understand, could u illustrate with a code sample Ram?
> 
>>> Date: Tue, 25 Dec 2012 00:21:14 +0530
>>> Subject: Re: Hbase Count Aggregate Function
>>> From: ramkrishna.s.vasudevan@gmail.com
>>> To: user@hbase.apache.org
>>> 
>>> Hi
>>> You could have custom filter implemented which is similar to
>>> FirstKeyOnlyfilter.
>>> Implement the filterKeyValue method such that it should match your keyvalue
>>> (the specific qualifier that you are looking for).
>>> 
>>> Deploy it in your cluster.  It should work.
>>> 
>>> Regards
>>> Ram
>>> 
>>> On Mon, Dec 24, 2012 at 10:35 PM, Dalia Sobhy <dalia.mohsobhy@hotmail.com>wrote:
>>> 
>>>> 
>>>> So do you have a suggestion how to enable/work the filter?
>>>> 
>>>>> Date: Mon, 24 Dec 2012 22:22:49 +0530
>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>> From: ramkrishna.s.vasudevan@gmail.com
>>>>> To: user@hbase.apache.org
>>>>> 
>>>>> Okie, seeing the shell script and the code I feel that while you use
this
>>>>> counter, the user's filter is not taken into account.
>>>>> It adds a FirstKeyOnlyFilter and proceeds with the scan. :(.
>>>>> 
>>>>> Regards
>>>>> Ram
>>>>> 
>>>>> On Mon, Dec 24, 2012 at 10:11 PM, Dalia Sobhy <
>>>> dalia.mohsobhy@hotmail.com>wrote:
>>>>> 
>>>>>> 
>>>>>> yeah scan gives the correct number of rows, while count returns the
>>>> total
>>>>>> number of rows.
>>>>>> 
>>>>>> Both are using the same filter, I even tried it using Java API, using
>>>> row
>>>>>> count method.
>>>>>> 
>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan);
>>>>>> 
>>>>>> I get the total number of rows not the number of rows filtered.
>>>>>> 
>>>>>> So any idea ??
>>>>>> 
>>>>>> Thanks Ram :)
>>>>>> 
>>>>>>> Date: Mon, 24 Dec 2012 21:57:54 +0530
>>>>>>> Subject: Re: Hbase Count Aggregate Function
>>>>>>> From: ramkrishna.s.vasudevan@gmail.com
>>>>>>> To: user@hbase.apache.org
>>>>>>> 
>>>>>>> So you find that scan with a filter and count with the same filter
is
>>>>>>> giving you different results?
>>>>>>> 
>>>>>>> Regards
>>>>>>> Ram
>>>>>>> 
>>>>>>> On Mon, Dec 24, 2012 at 8:33 PM, Dalia Sobhy <
>>>> dalia.mohsobhy@hotmail.com
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear all,
>>>>>>>> 
>>>>>>>> I have 50,000 row with diagnosis qualifier = "cardiac", and
another
>>>>>> 50,000
>>>>>>>> rows with "renal".
>>>>>>>> 
>>>>>>>> When I type this in Hbase shell,
>>>>>>>> 
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>> 
>>>>>>>> scan 'patient', { COLUMNS => "info:diagnosis", FILTER
=>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>> 
>>>>>>>> Output = 50,000 row
>>>>>>>> 
>>>>>>>> import org.apache.hadoop.hbase.filter.CompareFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
>>>>>>>> import org.apache.hadoop.hbase.filter.SubstringComparator
>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>> 
>>>>>>>> count 'patient', { COLUMNS => "info:diagnosis", FILTER
=>
>>>>>>>>    SingleColumnValueFilter.new(Bytes.toBytes('info'),
>>>>>>>>         Bytes.toBytes('diagnosis'),
>>>>>>>>         CompareFilter::CompareOp.valueOf('EQUAL'),
>>>>>>>>         SubstringComparator.new('cardiac'))}
>>>>>>>> Output = 100,000 row
>>>>>>>> 
>>>>>>>> Even though I tried it using Hbase Java API, Aggregation
Client
>>>>>> Instance,
>>>>>>>> and I enabled the Coprocessor aggregation for the table.
>>>>>>>> rowCount = aggregationClient.rowCount(TABLE_NAME, null, scan)
>>>>>>>> 
>>>>>>>> Also when measuring the improved performance on case of adding
more
>>>>>> nodes
>>>>>>>> the operation takes the same time.
>>>>>>>> 
>>>>>>>> So any advice please?
>>>>>>>> 
>>>>>>>> I have been throughout all this mess from a couple of weeks
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>                         
>                         

Mime
View raw message