hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: count of rows in table
Date Fri, 30 Jul 2010 01:28:38 GMT
If someone can share the commandline for running RowCounter, that would be
great.

Also, hbase shell count doesn't require column name. Why does RowCounter
require it ?

Thanks

On Thu, Jul 29, 2010 at 4:55 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hi,
>
> That table appears to be empty.  Eg:
>
> 10/07/29 22:38:43 INFO mapred.JobClient:     Map input records=0
>
>
> So back to the count issue... Counting in databases is a classic
> problem. Unless your DB system is keeping stats on how many
> inserts/deletes and thus how big it thinks the table is, you have to
> count all the rows by reading them.  HBase is no different, and a
> little harder, because we have a variable length data format, so we
> can't just estimate row sizes from file sizes.  Keeping distributed
> stats is not impossible, but certainly not on any priority list to be
> implemented - of course JIRAs/patches welcome etc.
>
> -ryan
>
>
> On Thu, Jul 29, 2010 at 3:48 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > We use HBase 0.20.5
> >
> > Here is the snippet from RowCounter output:
> >
> > 10/07/29 22:38:42 DEBUG client.HTable$ClientScanner: Finished with
> scanning
> > at REGION => {NAME =>
> >
> '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0,DFF46493EB352D0E31CBFA4652E6EC06,1280412540858',
> > STARTKEY => 'DFF46493EB352D0E31CBFA4652E6EC06', ENDKEY => '', ENCODED =>
> > 1375318608, TABLE => {{NAME =>
> > '2__HB_NOINC_ORCL_SQLLDR_0728-THREEGPPSPEECHCALLS-1280408509541-0',
> FAMILIES
> > => [{NAME => 'd', COMPRESSION => 'GZ', VERSIONS => '2', TTL =>
> '31536000',
> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME
> =>
> > 'i', COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE
> =>
> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, {NAME => 'v',
> > COMPRESSION => 'GZ', VERSIONS => '2', TTL => '31536000', BLOCKSIZE =>
> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
> > 10/07/29 22:38:42 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000000_0
> > is done. And is in the process of commiting
> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner:
> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task
> attempt_local_0001_m_000000_0
> > is allowed to commit now
> > 10/07/29 22:38:42 INFO mapred.FileOutputCommitter: Saved output of task
> > 'attempt_local_0001_m_000000_0' to
> > file:/usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc
> > 10/07/29 22:38:42 INFO mapred.LocalJobRunner:
> > 10/07/29 22:38:42 INFO mapred.TaskRunner: Task
> > 'attempt_local_0001_m_000000_0' done.
> > 10/07/29 22:38:43 INFO mapred.JobClient:  map 100% reduce 0%
> > 10/07/29 22:38:43 INFO mapred.JobClient: Job complete: job_local_0001
> > 10/07/29 22:38:43 INFO mapred.JobClient: Counters: 6
> > 10/07/29 22:38:43 INFO mapred.JobClient:   FileSystemCounters
> > 10/07/29 22:38:43 INFO mapred.JobClient:     FILE_BYTES_READ=1592883
> > 10/07/29 22:38:43 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1624956
> > 10/07/29 22:38:43 INFO mapred.JobClient:   Map-Reduce Framework
> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map input records=0
> > 10/07/29 22:38:43 INFO mapred.JobClient:     Spilled Records=0
> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map input bytes=0
> > 10/07/29 22:38:43 INFO mapred.JobClient:     Map output records=0
> >
> > [sjc1-hadoop8.sjc1:hadoop 3705]ls -l
> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000
> > -rwxrwxrwx 1 hadoop users 0 Jul 29 22:38
> > /usr/local/hadoop/trunk.80-275066/hbase-0.20.5/rc/part-00000
> >
> > But there are many records in the table I was querying.
> >
> > Can someone comment ?
> >
> > On Thu, Jul 29, 2010 at 2:26 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> In 0.89 you can specify CACHE for the count command. Set it higher (it
> >> defaults to 10 rows per call).
> >>
> >> Also you can use the RowCounter MR job.
> >>
> >> J-D
> >>
> >> On Thu, Jul 29, 2010 at 2:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >> > Hi,
> >> > The count method in HBase shell is quite slow.
> >> > Is there a way to obtain count faster ?
> >> >
> >> > Thanks
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message