hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Kambatla <ka...@cloudera.com>
Subject Re: RM ha issuses
Date Tue, 01 Apr 2014 04:51:16 GMT
It might be a good first step to compare the configurations for the vanilla
MR job and Hive MR job.


On Mon, Mar 31, 2014 at 7:06 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> Hi Karthik,
> I ram a common MR job, it does work well during RM failover.
>
> job progress:
> (there is failover with red font)
>
> 14/04/01 10:01:38 INFO mapreduce.Job:  map 61% reduce 8%
> 14/04/01 10:01:40 INFO mapreduce.Job:  map 61% reduce 10%
> 14/04/01 10:01:41 INFO mapreduce.Job:  map 62% reduce 10%
> 14/04/01 10:01:44 INFO mapreduce.Job:  map 63% reduce 10%
> 14/04/01 10:01:47 INFO mapreduce.Job:  map 64% reduce 10%
> 14/04/01 10:02:36 INFO mapreduce.Job:  map 60% reduce 0%
> 14/04/01 10:02:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing
> over to rm2
> 14/04/01 10:03:00 INFO mapreduce.Job:  map 63% reduce 0%
> 14/04/01 10:03:02 INFO mapreduce.Job:  map 66% reduce 2%
> 14/04/01 10:03:04 INFO mapreduce.Job:  map 67% reduce 2%
> 14/04/01 10:03:06 INFO mapreduce.Job:  map 69% reduce 2%
> 14/04/01 10:03:08 INFO mapreduce.Job:  map 71% reduce 2%
> 14/04/01 10:03:10 INFO mapreduce.Job:  map 72% reduce 2%
>
> So Hive job tasks are all restart during failover, please take a look.
>
>
>
> On Tue, Apr 1, 2014 at 7:20 AM, Azuryy <azuryyyu@gmail.com> wrote:
>
> > I will run a MR job to verify it.
> >
> > Stop RM means yarn-daemon.sh stop resourcemanager
> >
> > Thanks
> > Sent from my iPhone5s
> >
> > > On 2014年4月1日, at 0:38, Karthik Kambatla <kasha@cloudera.com> wrote:
> > >
> > > Thanks for reporting this, Azuryy. Indeed, this is surprising.
> > >
> > > I don't quite understand how Hive works; do you mind running a vanilla
> MR
> > > job and verifying if this is indeed the case. Also, when you say you
> > > stopped the Active RM, you mean only the RM process - correct?
> > >
> > >
> > >> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <azuryyyu@gmail.com>
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> I built from trunk, and configured RM Ha, then I submitted a hive job.
> > >> total 11 maps, then I stopped active RM when 6 maps finished.
> > >>
> > >> but Hive shows me all map tasks restat again. This is conflict with
> the
> > >> design description.
> > >>
> > >> job progress:
> > >> 2014-03-31 18:44:14,088 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 713.84 sec
> > >> 2014-03-31 18:44:15,128 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 722.83 sec
> > >> 2014-03-31 18:44:16,160 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 731.95 sec
> > >> 2014-03-31 18:44:17,191 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 744.17 sec
> > >> 2014-03-31 18:44:18,220 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 756.22 sec
> > >> 2014-03-31 18:44:19,250 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 762.4 sec
> > >> 2014-03-31 18:44:20,281 Stage-1 map = 68%,  reduce = 0%, Cumulative
> CPU
> > >> 774.64 sec
> > >> 2014-03-31 18:44:21,306 Stage-1 map = 70%,  reduce = 0%, Cumulative
> CPU
> > >> 786.49 sec
> > >> 2014-03-31 18:44:22,334 Stage-1 map = 70%,  reduce = 0%, Cumulative
> CPU
> > >> 792.59 sec
> > >> 2014-03-31 18:44:23,363 Stage-1 map = 73%,  reduce = 0%, Cumulative
> CPU
> > >> 807.58 sec
> > >> 2014-03-31 18:44:24,392 Stage-1 map = 77%,  reduce = 0%, Cumulative
> CPU
> > >> 815.96 sec
> > >> 2014-03-31 18:44:25,416 Stage-1 map = 80%,  reduce = 0%, Cumulative
> CPU
> > >> 823.83 sec
> > >> 2014-03-31 18:44:26,443 Stage-1 map = 80%,  reduce = 0%, Cumulative
> CPU
> > >> 826.84 sec
> > >> 2014-03-31 18:44:27,472 Stage-1 map = 82%,  reduce = 0%, Cumulative
> CPU
> > >> 832.16 sec
> > >> 2014-03-31 18:44:28,501 Stage-1 map = 84%,  reduce = 0%, Cumulative
> CPU
> > >> 839.73 sec
> > >> 2014-03-31 18:44:29,531 Stage-1 map = 86%,  reduce = 0%, Cumulative
> CPU
> > >> 844.45 sec
> > >> 2014-03-31 18:44:30,564 Stage-1 map = 82%,  reduce = 0%, Cumulative
> CPU
> > >> 760.34 sec
> > >> 2014-03-31 18:44:31,728 Stage-1 map = 0%,  reduce = 0%
> > >> 2014-03-31 18:45:06,918 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> > >> 213.81 sec
> > >> 2014-03-31 18:45:07,952 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU
> > >> 216.83 sec
> > >> 2014-03-31 18:45:08,979 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU
> > >> 229.15 sec
> > >> 2014-03-31 18:45:10,007 Stage-1 map = 11%,  reduce = 0%, Cumulative
> CPU
> > >> 244.42 sec
> > >> 2014-03-31 18:45:11,040 Stage-1 map = 14%,  reduce = 0%, Cumulative
> CPU
> > >> 247.31 sec
> > >> 2014-03-31 18:45:12,072 Stage-1 map = 18%,  reduce = 0%, Cumulative
> CPU
> > >> 259.5 sec
> > >> 2014-03-31 18:45:13,105 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 274.72 sec
> > >> 2014-03-31 18:45:14,135 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 280.76 sec
> > >> 2014-03-31 18:45:15,170 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 292.9 sec
> > >> 2014-03-31 18:45:16,202 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 305.16 sec
> > >> 2014-03-31 18:45:17,233 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 314.21 sec
> > >> 2014-03-31 18:45:18,264 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 323.34 sec
> > >> 2014-03-31 18:45:19,294 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 335.6 sec
> > >> 2014-03-31 18:45:20,325 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 344.71 sec
> > >> 2014-03-31 18:45:21,355 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 353.8 sec
> > >> 2014-03-31 18:45:22,385 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 366.06 sec
> > >> 2014-03-31 18:45:23,415 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 375.2 sec
> > >> 2014-03-31 18:45:24,449 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 384.28 sec
> > >> 2014-03-31 18:45:25,481 Stage-1 map = 23%,  reduce = 0%, Cumulative
> CPU
> > >> 396.54 sec
> > >> 2014-03-31 18:45:26,512 Stage-1 map = 25%,  reduce = 0%, Cumulative
> CPU
> > >> 408.72 sec
> > >> 2014-03-31 18:45:27,549 Stage-1 map = 25%,  reduce = 0%, Cumulative
> CPU
> > >> 414.69 sec
> > >> 2014-03-31 18:45:28,582 Stage-1 map = 30%,  reduce = 0%, Cumulative
> CPU
> > >> 426.99 sec
> > >> 2014-03-31 18:45:29,614 Stage-1 map = 32%,  reduce = 0%, Cumulative
> CPU
> > >> 439.25 sec
> > >> 2014-03-31 18:45:30,653 Stage-1 map = 34%,  reduce = 0%, Cumulative
> CPU
> > >> 448.25 sec
> > >> 2014-03-31 18:45:31,683 Stage-1 map = 39%,  reduce = 0%, Cumulative
> CPU
> > >> 460.5 sec
> > >> 2014-03-31 18:45:32,723 Stage-1 map = 41%,  reduce = 0%, Cumulative
> CPU
> > >> 469.63 sec
> > >> 2014-03-31 18:45:33,754 Stage-1 map = 43%,  reduce = 0%, Cumulative
> CPU
> > >> 478.67 sec
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message