hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Inconsistent row count between mapreduce and shell count
Date Sun, 10 Feb 2013 07:05:21 GMT
Kiran:
Take a look at src/main/ruby/shell/commands/move.rb

You would see help on how to move region.

Cheers

On Sat, Feb 9, 2013 at 9:46 PM, kiran chitturi <chitturikiran15@gmail.com>wrote:

> Many Thanks Lars for your suggestions! I have added them to the command
>
> /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3"
> -Dhbase.client.scanner.caching=1000
> -Dmapred.map.tasks.speculative.execution=false documents
>
> I have stopped the datasources which write data in to the table but it did
> not work. There is not much difference in the rowCount mapreduce is
> showing.
>
> Though, the rowcount returned is presistent once i stopped writing data in
> to the table. ( I ran the command 3 times). The shell count is also same
> once i stopped writing.
>
> Since most of the rows are tweets, around 1.4 million rows are stored on a
> single data node.  (region server)
>
> Do you know of any way that i can reassign the regions in the table without
> losing the data ? Will it make a difference then ?
>
> Thank you,
> Kiran.
>
>
>
>
> On Sat, Feb 9, 2013 at 11:38 PM, lars hofhansl <larsh@apache.org> wrote:
>
> > That looks all as it should.
> > Unless you somehow pointed the M/R job to another cluster I have no good
> > explanation.
> >
> >
> > Would be interesting to see whether in the absence of writes you'd always
> > get precisely the same numbers.
> > (Look like it might be the case, your 2nd run is not wildly different
> from
> > the first).
> >
> >
> > This is a bit disconcerting. Is there anything "interesting" in the logs?
> >
> >
> > Aside: For performance reasons you'd probably want to enable scanner
> > caching for the M/R: -Dhbase.client.scanner.caching=100 (or 1000)
> >
> > And also turn off speculative execution (we should do that by default):
> > -Dmapred.map.tasks.speculative.execution=false
> >
> > It might be the speculative execution that throws the job off, I am just
> > guessing now.
> >
> >
> > -- Lars
> >
> > ________________________________
> > From: kiran chitturi <chitturikiran15@gmail.com>
> > To: user <user@hbase.apache.org>; lars hofhansl <larsh@apache.org>
> > Sent: Saturday, February 9, 2013 6:51 PM
> > Subject: Re: Inconsistent row count between mapreduce and shell count
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Feb 9, 2013 at 9:17 PM, lars hofhansl <larsh@apache.org> wrote:
> >
> > Hmm... Can you show us the exact commands you executed?
> > >
> > >
> > I am writing below the exact commands that i have used.
> >
> > In the hbase shell, for the table documents i have used
> >    count 'documents'
> >
> > The mapreduce command is
> >     /opt/hadoop-1.0.4/bin/hadoop jar /opt/hbase-0.94.1/hbase-0.94.1.jar
> > rowcounter -Dhbase.zookeeper.quorum="LucidN1,LucidN2,LucidN3" documents
> >
> >
> > And just to rule out the obvious:
> > >1. There were no writes while you did the row count?
> > >
> >            Actually, we have a few automated programs which write tweets
> > to the table over time. So there might be writes when the row count is
> > there
> >            Should i disable writes when doing the mapreduce ?
> >
> > 2. In the RowCount M/R case you specified neither a range nor any
> columns?
> > >
> > >
> >     No
> >
> > >Do you always get the exact same numbers in both cases? Or do they vary?
> > >
> >    I just did another map reduce and this time the number is 1394234. The
> > actual count from shell is 2157447
> >
> > Thanks!
> >
> >
> > >
> > >----- Original Message -----
> > >From: kiran chitturi <chitturikiran15@gmail.com>
> > >To: user <user@hbase.apache.org>
> > >Cc:
> > >Sent: Saturday, February 9, 2013 4:49 PM
> > >Subject: Re: Inconsistent row count between mapreduce and shell count
> > >
> > >Yes. I just counted the number of regions in '
> > >http://machine1:60010/table.jsp?name=documents'; and the count is 53
> > which
> > >is equal to the number of complete tasks in hadoop.
> > >
> > >
> > >Thanks,
> > >Kiran.
> > >
> > >
> > >On Sat, Feb 9, 2013 at 7:43 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > >> Apart from the 5 killed tasks, was the number of successful tasks
> equal
> > to
> > >> the number of regions in your table ?
> > >>
> > >> Thanks
> > >>
> > >> On Sat, Feb 9, 2013 at 4:14 PM, kiran chitturi <
> > chitturikiran15@gmail.com
> > >> >wrote:
> > >>
> > >> > Hi!
> > >> >
> > >> > I am using Hbase 0.94.1 version over a distributed cluster of 20
> > nodes.
> > >> >
> > >> > When i execute hbase count over a table in a shell, i got the count
> of
> > >> > 2152416 rows.
> > >> >
> > >> > When i did the same thing using the rowcounter mapreduce, i got the
> > value
> > >> > as below
> > >> >
> > >> >
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> > >> > 13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991
> > >> >
> > >> > Same thing happened when i used pig to count or do operations. There
> > is
> > >> > inconsistency between both the results.
> > >> >
> > >> > During the mapreduce, i have noticed that there are 5 tasks that are
> > >> > killed. When i tried to trace back to the tasktracker logs of the
> > node it
> > >> > shows similar to below log.
> > >> >
> > >> > 2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker:
> > JVM
> > >> > with ID: jvm_201302090035_0015_m_1905604998 given task:
> > >> > attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > >> > Received KillTaskAction for task:
> attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker:
> > >> About
> > >> > to purge task: attempt_201302090035_0015_m_000012_1
> > >> > 2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree:
> > >> Killing
> > >> > process group9745 with signal TERM. Exit code 0
> > >> >
> > >> > I have also tried to run the tool 'hbck' but it shows no
> > inconsistencies.
> > >> >
> > >> > Can you please suggest me why there is inconsistency and how can i
> > >> correct
> > >> > it ?
> > >> >
> > >> > Thanks,
> > >> > --
> > >> > Kiran Chitturi
> > >> >
> > >>
> > >
> > >
> > >
> > >--
> > >Kiran Chitturi
> > >
> > >
> >
> >
> > --
> >
> > Kiran Chitturi
> >
>
>
>
> --
> Kiran Chitturi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message