hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From francexo83 <francex...@gmail.com>
Subject Re: MR job fails with too many mappers
Date Thu, 20 Nov 2014 14:14:12 GMT
Hi,

as I said before, I wrote TableInputFormat and RecordReader extension that
reads input data from an Hbase table,

in my case every single row is associated with a single InputSplit.

For example if I have 300000 rows to process,  my custom TableInputFormat
will generate 300000 input splits and as a result

300000 mapper task in my MapReguce job.

That's all.

Regards



2014-11-20 6:02 GMT+01:00 Susheel Kumar Gadalay <skgadalay@gmail.com>:

> In which case the split metadata go beyond 10MB?
> Can u give some details of your input file and splits.
>
> On 11/19/14, francexo83 <francexo83@gmail.com> wrote:
> > Thank you very much for your suggestion, it was very helpful.
> >
> > This is what I have after  turning off log aggregation:
> >
> > 2014-11-18 18:39:01,507 INFO [main]
> > org.apache.hadoop.service.AbstractService: Service
> > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED;
> > cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> > java.io.IOException: Split metadata size exceeded 10000000. Aborting job
> > job_1416332245344_0004
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> > java.io.IOException: Split metadata size exceeded 10000000. Aborting job
> > job_1416332245344_0004
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1551)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1406)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1373)
> >         at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> >         at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> >         at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> >         at
> >
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:986)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1249)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1049)
> >         at
> > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1460)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1456)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1389)
> > Caused by: java.io.IOException: Split metadata size exceeded 10000000.
> > Aborting job job_1416332245344_0004
> >         at
> >
> org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53)
> >         at
> >
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1546)
> >
> >
> > I exceeded the split metadata size so I  added the following property
> into
> > the mapred-site.xml and it worked:
> >
> > <property>
> >     <name>mapreduce.job.split.metainfo.maxsize</name>
> >     <value>500000000</value>
> > </property>
> >
> > thanks again.
> >
> >
> >
> >
> >
> >
> >
> >
> > 2014-11-18 17:59 GMT+01:00 Rohith Sharma K S <rohithsharmaks@huawei.com
> >:
> >
> >>  If log aggregation is enabled, log folder will be deleted. So I suggest
> >> disable “yarn.log-aggregation-enable” and run job again. All the logs
> >> remains at log folder. Then you can find container logs
> >>
> >>
> >>
> >> Thanks & Regards
> >>
> >> Rohith Sharma K S
> >>
> >>
> >>
> >> This e-mail and its attachments contain confidential information from
> >> HUAWEI, which is intended only for the person or entity whose address is
> >> listed above. Any use of the information contained herein in any way
> >> (including, but not limited to, total or partial disclosure,
> >> reproduction,
> >> or dissemination) by persons other than the intended recipient(s) is
> >> prohibited. If you receive this e-mail in error, please notify the
> sender
> >> by phone or email immediately and delete it!
> >>
> >>
> >>
> >> *From:* francexo83 [mailto:francexo83@gmail.com]
> >> *Sent:* 18 November 2014 22:15
> >> *To:* user@hadoop.apache.org
> >> *Subject:* Re: MR job fails with too many mappers
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >> thank you for your quick response, but I was not able to see the logs
> for
> >> the container.
> >>
> >>
> >>
> >> I get a  "no such file or directory" when I try to access the logs of
> the
> >> container from the shell:
> >>
> >>
> >>
> >> cd /var/log/hadoop-yarn/containers/application_1416304409718_0032
> >>
> >>
> >>
> >>
> >>
> >> It seems that the container has never been created.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> thanks
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 2014-11-18 16:43 GMT+01:00 Rohith Sharma K S <rohithsharmaks@huawei.com
> >:
> >>
> >> Hi
> >>
> >>
> >>
> >> Could you get syserr and sysout log for contrainer.? These logs will be
> >> available in the same location  syslog for container.
> >>
> >> ${yarn.nodemanager.log-dirs}/<app-id>/<container-id>
> >>
> >> This helps to find problem!!
> >>
> >>
> >>
> >>
> >>
> >> Thanks & Regards
> >>
> >> Rohith Sharma K S
> >>
> >>
> >>
> >> *From:* francexo83 [mailto:francexo83@gmail.com]
> >> *Sent:* 18 November 2014 20:53
> >> *To:* user@hadoop.apache.org
> >> *Subject:* MR job fails with too many mappers
> >>
> >>
> >>
> >> Hi All,
> >>
> >>
> >>
> >> I have a small  hadoop cluster with three nodes and HBase 0.98.1
> >> installed
> >> on it.
> >>
> >>
> >>
> >> The hadoop version is 2.3.0 and below my use case scenario.
> >>
> >>
> >>
> >> I wrote a map reduce program that reads data from an hbase table and
> does
> >> some transformations on these data.
> >>
> >> Jobs are very simple so they didn't need the  reduce phase. I also wrote
> >> a
> >> TableInputFormat  extension in order to maximize the number of
> concurrent
> >> maps on the cluster.
> >>
> >> In other words, each  row should be processed by a single map task.
> >>
> >>
> >>
> >> Everything goes well until the number of rows and consequently  mappers
> >> exceeds 300000 quota.
> >>
> >>
> >>
> >> This is the only exception I see when the job fails:
> >>
> >>
> >>
> >> Application application_1416304409718_0032 failed 2 times due to AM
> >> Container for appattempt_1416304409718_0032_000002 exited with exitCode:
> >> 1
> >> due to:
> >>
> >>
> >>
> >>
> >>
> >> Exception from container-launch:
> >> org.apache.hadoop.util.Shell$ExitCodeException:
> >>
> >> org.apache.hadoop.util.Shell$ExitCodeException:
> >>
> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
> >>
> >> at org.apache.hadoop.util.Shell.run(Shell.java:424)
> >>
> >> at
> >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
> >>
> >> at
> >>
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
> >>
> >> at
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
> >>
> >> at
> >>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> >>
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >>
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >>
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >>
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >> Container exited with a non-zero exit code 1
> >>
> >>
> >>
> >>
> >>
> >> Cluster configuration details:
> >>
> >> Node1: 12 GB, 4 core
> >>
> >> Node2: 6 GB, 4 core
> >>
> >> Node3: 6 GB, 4 core
> >>
> >>
> >>
> >> yarn.scheduler.minimum-allocation-mb=2048
> >>
> >> yarn.scheduler.maximum-allocation-mb=4096
> >>
> >> yarn.nodemanager.resource.memory-mb=6144
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> Regards
> >>
> >>
> >>
> >
>

Mime
View raw message