hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmed Radwan <a.aboel...@gmail.com>
Subject MR-279: Flaky errors on a real cluster
Date Wed, 13 Jul 2011 20:44:07 GMT
I am testing mr2 on a small real cluster, but I am seeing some
flaky behavior in running jobs. The same exact job with the same
configuration can sometimes run successfully or generate one of the
following errors. It is random as far as I see (the job can give the error
one time and then run normally the next, and so on).

Have anyone seen this behavior before?

ERROR 1:
--------------
11/07/13 13:21:22 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc
proxy for protocol interface
org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
11/07/13 13:21:22 INFO mapred.ClientServiceDelegate: Connecting to
172.29.5.33:52675
11/07/13 13:21:22 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc
proxy for protocol interface
org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
11/07/13 13:21:23 INFO ipc.Client: Retrying connect to server: /
172.29.5.33:52675. Already tried 0 time(s).
11/07/13 13:21:24 INFO ipc.Client: Retrying connect to server: /
172.29.5.33:52675. Already tried 1 time(s).
11/07/13 13:21:25 INFO ipc.Client: Retrying connect to server: /
172.29.5.33:52675. Already tried 2 time(s).
java.lang.reflect.UndeclaredThrowableException
at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:161)
at
org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:285)
at
org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:522)
at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:540)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1130)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1084)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException:
Call to /172.29.5.33:52675 failed on connection exception:
java.net.ConnectException: Connection refused
at
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:96)
at $Proxy9.getTaskAttemptCompletionEvents(Unknown Source)
at
org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:154)
... 18 more

ERROR 2:
--------------
11/07/13 13:32:30 INFO mapred.ClientServiceDelegate: Connecting to
172.29.5.34:41667
11/07/13 13:32:30 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc
proxy for protocol interface
org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
11/07/13 13:32:35 INFO mapreduce.Job: Task Id :
attempt_1310587965851_0005_m_000000_0, Status : FAILED
java.io.FileNotFoundException: File
file:/tmp/nm-local-dir/usercache/ahmed/appcache/application_1310587965851_0005
does not exist.
at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:412)
at
org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:109)
at
org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:74)
at
org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:332)
at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:367)
at
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:551)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:630)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:627)
at
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2278)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:627)
at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2097)
at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2039)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:81)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:779)


-- 
Ahmed

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message