hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wojciech Langiewicz <wlangiew...@gmail.com>
Subject Re: Row count without iterating over ResultScanner?
Date Sun, 01 May 2011 18:51:02 GMT
Thanks, that's great. But I firstly I have to update HBase and read some 
documentation, so I'll let you know in a while how that works for me.

On 01.05.2011 20:42, Himanshu Vashishtha wrote:
> Yes, you can define your scan object at the client side and pass to the
> AggregateClient.rowCount. You can refer to AggregateClient javadoc and
> associated TestAggregateProtocol test methods to get an idea.
>
> Thanks,
> Himanshu
>
> On Sun, May 1, 2011 at 12:29 PM, Wojciech Langiewicz
> <wlangiewicz@gmail.com>wrote:
>
>> Hi,
>>
>> On 01.05.2011 20:03, Himanshu Vashishtha wrote:
>>
>>> If you are interested row count only (and not want to fetch the table rows
>>> to your client side), you can also try out
>>> https://issues.apache.org/jira/browse/HBASE-1512.
>>>
>>
>> Yes, I only want to count rows and apply filters or select columns.
>> Are filters also supported to work with those aggregate functions?
>>
>>
>>   PS: Which version you are on? The above patch is in main trunk as of now,
>>> so
>>> to use it you would have to checkout the code and build it.
>>>
>>
>> I'm using version from CDH3, so it is: 0.90.1-cdh3u0, but I'm not bound to
>> this version.
>>
>> Coprocessors with aggregate functions seem to be the thing I need. Thanks!
>> --
>> Wojciech Langiewicz
>>
>>
>>   Thanks,
>>> Himanshu
>>>
>>>
>>> On Sun, May 1, 2011 at 11:55 AM, Doug Meil<doug.meil@explorysmedical.com
>>>> wrote:
>>>
>>>   What caching value are you using on the scan?  If you aren't setting
>>>> this,
>>>> it's probably using the default - which is 1.  Which is slow.
>>>> http://hbase.apache.org/book.html#d379e3504
>>>>
>>>> Re:  "I would like to use HBase API, not MR job (because this cluster
>>>> only
>>>> has HDFS and HBase installed)."
>>>>
>>>> For Very Large tables you want to start using an MR job for this.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
>>>> Sent: Sunday, May 01, 2011 9:44 AM
>>>> To: user@hbase.apache.org
>>>> Subject: Row count without iterating over ResultScanner?
>>>>
>>>> Hi,
>>>> I would like to know if there's a way to quickly count number of rows
>>>> from
>>>> scan result?
>>>> Right now I'm iterating over ResultScanner like this:
>>>> int count = 0;
>>>> for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
>>>>         ++count;
>>>> }
>>>> But with number of rows reaching millions this takes a while.
>>>> I tried to find something in documentation, but I didn't found anything.
>>>> I would like to use HBase API, not MR job (because this cluster only has
>>>> HDFS and HBase installed).
>>>>
>>>> Thanks for all help.
>>>>
>>>> --
>>>> Wojciech Langiewicz
>>>>
>>>>
>>>
>>
>


Mime
View raw message