hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vashishtha <hvash...@cs.ualberta.ca>
Subject Re: Row count without iterating over ResultScanner?
Date Sun, 01 May 2011 18:03:28 GMT
If you are interested row count only (and not want to fetch the table rows
to your client side), you can also try out
https://issues.apache.org/jira/browse/HBASE-1512.

PS: Which version you are on? The above patch is in main trunk as of now, so
to use it you would have to checkout the code and build it.

Thanks,
Himanshu


On Sun, May 1, 2011 at 11:55 AM, Doug Meil <doug.meil@explorysmedical.com>wrote:

> What caching value are you using on the scan?  If you aren't setting this,
> it's probably using the default - which is 1.  Which is slow.
> http://hbase.apache.org/book.html#d379e3504
>
> Re:  "I would like to use HBase API, not MR job (because this cluster only
> has HDFS and HBase installed)."
>
> For Very Large tables you want to start using an MR job for this.
>
>
> -----Original Message-----
> From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
> Sent: Sunday, May 01, 2011 9:44 AM
> To: user@hbase.apache.org
> Subject: Row count without iterating over ResultScanner?
>
> Hi,
> I would like to know if there's a way to quickly count number of rows from
> scan result?
> Right now I'm iterating over ResultScanner like this:
> int count = 0;
> for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
>        ++count;
> }
> But with number of rows reaching millions this takes a while.
> I tried to find something in documentation, but I didn't found anything.
> I would like to use HBase API, not MR job (because this cluster only has
> HDFS and HBase installed).
>
> Thanks for all help.
>
> --
> Wojciech Langiewicz
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message