hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj k <devara...@huawei.com>
Subject RE: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.
Date Wed, 26 Jun 2013 06:27:23 GMT

   Could you check the network usage in the cluster when this problem occurs? Probably it
is causing due to high network usage.

Devaraj k

From: blah blah [mailto:tmp5330@gmail.com]
Sent: 26 June 2013 05:39
To: user@hadoop.apache.org
Subject: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.

Hi All
First let me excuse for the poor thread title but I have no idea how to express the problem
in one sentence.
I have implemented new Application Master with the use of Yarn. I am using old Yarn development
version. Revision 1437315, from 2013-01-23 (SNAPSHOT 3.0.0). I can not update to current trunk
version, as prototype deadline is soon, and I don't have time to include Yarn API changes.
Currently I execute experiments in pseudo-distributed mode, I use guava version 14.0-rc1.
I have a problem with Yarn's and HDFS Exceptions for "larger" datasets. My AM works fine and
I can execute it without a problem for a debug dataset (1MB size). But when I increase the
size of input to 6.8 MB, I am getting the following exceptions:

Exception in thread "Thread-3" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
    at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77)
    at org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194)
    at org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219)
    at org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315)
    at java.lang.Thread.run(Thread.java:662)
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception:
java.io.IOException: Response is null.; Host Details : local host is: "linux-ljc5.site/<>";
destination host is: "":8030;
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
    at $Proxy10.allocate(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75)
    ... 4 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Response is
null.; Host Details : local host is: "linux-ljc5.site/<>";
destination host is: "":8030;
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
    at org.apache.hadoop.ipc.Client.call(Client.java:1240)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
    ... 6 more
Caused by: java.io.IOException: Response is null.
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)

Exception in thread "org.apache.hadoop.hdfs.SocketCache@6da0d866<mailto:org.apache.hadoop.hdfs.SocketCache@6da0d866>"
java.lang.NoSuchMethodError: com.google.common.collect.LinkedListMultimap.values()Ljava/util/List;
    at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257)
    at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45)
    at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126)
    at java.lang.Thread.run(Thread.java:662)

As I said this problem does not occur for the 1MB input. For the 6MB input nothing is changed
except the input dataset. Now a little bit of what am I doing, to give you the context of
the problem. My AM starts N (debug 4) containers and each container reads its input data part.
When this process is finished I am exchanging parts of input between containers (exchanging
IDs of input structures, to provide means for communication between data structures). During
the process of exchanging IDs these exceptions occur. I start Netty Server/Client on each
container and I use ports 12000-12099 as mean of communicating these IDs.
Any help will be greatly appreciated. Sorry for any typos and if the explanation is not clear
just ask for any details you are interested in. Currently it is after 2 AM I hope this will
be a valid excuse.

View raw message