hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sidney Simmons <ssimm...@nmitconsulting.co.uk>
Subject Re: Call to namenode fails with java.io.EOFException
Date Fri, 13 May 2011 07:56:54 GMT
All nodes are in sync configuration wise. We have a few cluster scripts that
ensure this is the case.


On 13 May 2011 06:55, Harsh J <harsh@cloudera.com> wrote:

> One of the reasons I can think of could be a version mismatch. You may
> want to ensure that the job in question was not carrying a separate
> version of Hadoop along with it inside, perhaps?
>
> On Fri, May 13, 2011 at 12:42 AM, Sidney Simmons
> <ssimmons@nmitconsulting.co.uk> wrote:
> > Hi there,
> >
> > I'm experiencing some unusual behaviour on our 0.20.2 hadoop cluster.
> > Randomly (periodically), we're getting "Call to namenode" failures on
> > tasktrackers causing tasks to fail:
> >
> > 2011-05-12 14:36:37,462 WARN org.apache.hadoop.mapred.TaskRunner:
> > attempt_201105090819_059_m_0038_0Child Error
> > java.io.IOException: Call to namenode/10.10.10.10:9000 failed on local
> > exception: java.io.EOFException
> >       at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >       at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >       at $Proxy5.getFileInfo(Unknown Source)
> >       at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> >       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> >       at java.lang.reflect.Method.invoke(Unknown Source)
> >       at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> >       at
> >
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> >       at $Proxy5.getFileInfo(Unknown Source)
> >       at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615)
> >       at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
> >       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:210)
> > Caused by: java.io.EOFException
> >       at java.io.DataInputStream.readInt(Unknown Source)
> >       at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > The namenode log (logging level = INFO) shows the following a few seconds
> > either side of the above timestamps. Could be relevant or it could be a
> > coincidence :
> >
> > 2011-05-12 14:36:40,005 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 57 on 9000 caught: java.nio.channels.ClosedChannelException
> >       at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
> >       at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> >       at org.apache.hadoop.ipc.Server.channelWrite(Server.java:1213)
> >       at org.apache.hadoop.ipc.Server.access$1900(Server.java:77)
> >       at
> > org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:622)
> >       at
> org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:686)
> >       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:997)
> >
> > The jobtracker does however have an entry that correlates with the
> > tasktracker :
> >
> > 2011-05-12 14:36:39,781 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error
> > from attempt_201105090819_059_m_0038_0: java.io.IOException: Call to
> > namenode/10.10.10.10:9000 failed on local exception:
> java.io.EOFException
> >       at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
> >       at org.apache.hadoop.ipc.Client.call(Client.java:743)
> >       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> >       at $Proxy1.getProtocolVersion(Unknown Source)
> >       at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
> >       at
> > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:105)
> >       at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:208)
> >       at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:169)
> >       at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
> >       at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
> >       at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >       at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >       at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
> >       at org.apache.hadoop.mapred.Child.main(Child.java:157)
> > Caused by: java.io.EOFException
> >       at java.io.DataInputStream.readInt(Unknown Source)
> >       at
> > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
> >       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
> >
> > Can anyone give me any pointers on how to start troubleshooting this
> issue?
> > It's very sporadic and we haven't been able to reproduce the issue yet in
> > our lab. After looking through the mailing list archives, some of the
> > suggestions revolve around the following settings:
> >
> > dfs.namenode.handler.count 128 (existing 64)
> > dfs.datanode.handler.count 10 (existing 3)
> > dfs.datanode.max.xcievers 4096 (existing 256)
> >
> > Any pointers ?
> >
> > Thanks in advance
> >
> > Sid Simmons
> > Infrastructure Support Specialist
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message