hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiran chitturi <chitturikira...@gmail.com>
Subject Re: Inconsistent row count between mapreduce and shell count
Date Sun, 10 Feb 2013 02:51:34 GMT
On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <larsh@apache.org> wrote:

> Hmm... Can you show us the exact commands you executed?
>
> I am writing below the exact commands that i have used.

In the hbase shell, for the table documents i have used
   count 'documents'

The mapreduce command is
    /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents



> And just to rule out the obvious:
> 1. There were no writes while you did the row count?
>
           Actually, we have a few automated programs which write tweets to
the table over time. So there might be writes when the row count is there
           Should i disable writes when doing the mapreduce ?

2. In the RowCount M/R case you specified neither a range nor any columns?
>
>     No

>
> Do you always get the exact same numbers in both cases? Or do they vary?
>
   I just did another map reduce and this time the number is 1394234. The
actual count from shell is 2157447

Thanks!


>
> ----- Original Message -----
> From: kiran chitturi <chitturikiran15@gmail.com>
> To: user <user@hbase.apache.org>
> Cc:
> Sent: Saturday, February 9, 2013 4:49 PM
> Subject: Re: Inconsistent row count between mapreduce and shell count
>
> Yes. I just counted the number of regions in '
> http://machine1:60010/table.jsp?name=documents' and the count is 53 which
> is equal to the number of complete tasks in hadoop.
>
>
> Thanks,
> Kiran.
>
>
> On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Apart from the 5 killed tasks, was the number of successful tasks equal
> to
> > the number of regions in your table ?
> >
> > Thanks
> >
> > On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <
> chitturikiran15@gmail.com
> > >wrote:
> >
> > > Hi!
> > >
> > > I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.
> > >
> > > When i execute hbase count over a table in a shell, i got the count of
> > > 2152416 rows.
> > >
> > > When i did the same thing using the rowcounter mapreduce, i got the
> value
> > > as below
> > >
> > > org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> > >
> > > Same thing happened when i used pig to count or do operations. There is
> > > inconsistency between both the results.
> > >
> > > During the mapreduce, i have noticed that there are 5 tasks that are
> > > killed. When i tried to trace back to the tasktracker logs of the node
> it
> > > shows similar to below log.
> > >
> > > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker:
> JVM
> > > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > > attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > > Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > About
> > > to purge task: attempt_201302090035_0015_m_000012_1
> > > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> > Killing
> > > process group9745 with signal TERM. Exit code 0
> > >
> > > I have also tried to run the tool 'hbck' but it shows no
> inconsistencies.
> > >
> > > Can you please suggest me why there is inconsistency and how can i
> > correct
> > > it ?
> > >
> > > Thanks,
> > > --
> > > Kiran Chitturi
> > >
> >
>
>
>
> --
> Kiran Chitturi
>
>


-- 
Kiran Chitturi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message