hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Blaisdell" <lunk.dj...@gmail.com>
Subject Re: any chance to get the size of a table?
Date Mon, 21 Jul 2008 15:47:28 GMT
RowCount is great example code for introduction to MapRed programs
over HBase. I found it to be very beneficial to my understanding of
hbase to reimpliment a RowCount job from the ground up as an exercise.

-Daniel

On Mon, Jul 21, 2008 at 11:36 AM, Jonathan Gray <jlist@streamy.com> wrote:
> Having spent many years in the RDBMS world, the straight answer to that is,
> it depends.
>
> Postgres was notorious for being a poor performer when it came to count().
> That's because Postgres fetches each row off of disk while doing the count,
> for safety reasons.  MySQL, on the other hand, "trusts" its indexes and
> therefore can perform a full table count() just by pulling an index off of
> disk.
>
> The safer the database, the longer a row count will take.
>
> However most RDBMS' keep table statistics used in query planning.  If you
> want "rough" row counts, you can also do straight select queries into the
> statistics tables.
>
> As stack suggests, our solution for doing row counting is running MR jobs.
> In postgres we used to have a TRIGGER system that would maintain
> pre-computed counts for things we need to aggregate.  Things like this tend
> to be a nightmare any way you slice it :)
>
> Jon
>
> -----Original Message-----
> From: ZhaoWei [mailto:wzhao1984@gmail.com]
> Sent: Monday, July 21, 2008 7:15 AM
> To: hbase-user@hadoop.apache.org; jdcryans@gmail.com
> Subject: Re: any chance to get the size of a table?
>
> Thanks J-D, that sounds annoying. Should the row count be a piece of meta
> data?
> How does a RDBMS do when one types "selct count(xxx) from xxx"?
>
>> Zhao,
>>
>> Yes, the only way is to use a scanner but it will take a _long_ time.
> HBASE-32
>> <https://issues.apache.org/jira/browse/HBASE-32>is about adding a row
> count
>> estimator. For those who want to know why it's so slow, having a scanner
>> that goes on each row of a table requires doing a read request on disk for
>> each one of them (except for the stuff in the memcache that waits to be
>> flushed). If you have 6 500 000 rows like I saw last week on the IRC
>> channel, i may take well over 80 minutes (it depends on the cpu/io/network
>> load, hardware, etc).
>>
>> J-D
>>
>> On Mon, Jul 21, 2008 at 5:21 AM, ZhaoWei <wzhao1984@gmail.com> wrote:
>>
>> > Hi J-D,
>> >  How to get row count of a table, only scanner?
>> >
>> >
>> > Thanks!
>> >
>> > > Daniel,
>> > >
>> > > Sorry, this feature is still missing in HBase. For the moment, the
> best
>> > you
>> > > can do is to use HDFS web UI. If you would like to this in a future
>> > release,
>> > > feel free to fill a Jira: https://issues.apache.org/jira/browse/HBASE
>> > >
>> > > J-D
>> > >
>> > > On Sat, Jul 19, 2008 at 5:58 PM, Daniel <d4nielfree@gmail.com> wrote:
>> > >
>> > > > hi all,
>> > > >    it's a bit strange, but i cant find some class or method to get
> the
>> > > > 'size' of a created table - maybe the total size of all the HStores
> ?
>> > > > or is there any command in HQL can do this?
>> > > >    Thanks.
>> > > >
>> > > > Daniel
>> > > >
>> >
>
>

Mime
View raw message