hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <ojo...@hortonworks.com>
Subject Re: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.
Date Mon, 01 Jul 2013 18:22:39 GMT
Also due you see any exception in RM / NM logs?

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Mon, Jul 1, 2013 at 11:19 AM, Omkar Joshi <ojoshi@hortonworks.com> wrote:

> Hi,
>
> As I don't know your complete AM code and how your containers are
> communicating with each other...Certain things which might help you in
> debugging.... where you are starting your RM (is it really running on
> 8030???? are you sure there is no previously started RM still running
> there?) Also in yarn-site.xml can you try changing RM address to something
> like "localhost:<free-port-but-not-default>" and configure maximum client
> thread size for handling AM requests? only your AM is expected to
> communicate with RM on AM-RM protocol.. by any chance in your code; are
> containers directly communicating with RM on AM-RM protocol??
>
>   <property>
>
>     <description>The address of the scheduler interface.</description>
>
>     <name>yarn.resourcemanager.scheduler.address</name>
>
>     <value>${yarn.resourcemanager.hostname}:8030</value>
>
>   </property>
>
>
>   <property>
>
>     <description>Number of threads to handle scheduler interface.</
> description>
>
>     <name>yarn.resourcemanager.scheduler.client.thread-count</name>
>
>     <value>50</value>
>
>   </property>
>
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Fri, Jun 28, 2013 at 5:35 AM, blah blah <tmp5330@gmail.com> wrote:
>
>> Hi
>>
>> Sorry to reply so late. I don't have the data you requested (sorry I have
>> no time, my deadline is within 3 days). However I have observed that this
>> issue occurs not only for the "larger" datasets (6.8MB), but for all
>> datasets and all jobs in general. However for smaller datasets (1MB) the AM
>> does not throw the Exception, only containers throw exceptions (same as in
>> previous e-mail). When these exception are throws my code (AM and
>> containers) does not perform any operations on HDFS, they only perform
>> in-memory computation and communication. Also I have observed that these
>> exception occur at "random", I couldn't observe any pattern. I can execute
>> job successfully, then resubmit the job repeating the experiment and these
>> exceptions occur (no change was made to src code, input dataset,or
>> execution/input parameters).
>>
>> As for the high network usage, as I said I don't have the data. But YARN
>> is running on nodes which are exclusive for my experiments no other
>> software runs on these nodes (only OS and YARN). Besides I don't think that
>> 20 containers working on 1MB dataset (total) can be called high network
>> usage.
>>
>> regards
>> tmp
>>
>>
>>
>> 2013/6/26 Devaraj k <devaraj.k@huawei.com>
>>
>>>  Hi,****
>>>
>>> ** **
>>>
>>>    Could you check the network usage in the cluster when this problem
>>> occurs? Probably it is causing due to high network usage. ****
>>>
>>> ** **
>>>
>>> Thanks****
>>>
>>> Devaraj k****
>>>
>>> ** **
>>>
>>> *From:* blah blah [mailto:tmp5330@gmail.com]
>>> *Sent:* 26 June 2013 05:39
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Yarn HDFS and Yarn Exceptions when processing "larger"
>>> datasets.****
>>>
>>> ** **
>>>
>>> Hi All****
>>>
>>> First let me excuse for the poor thread title but I have no idea how to
>>> express the problem in one sentence. ****
>>>
>>> I have implemented new Application Master with the use of Yarn. I am
>>> using old Yarn development version. Revision 1437315, from 2013-01-23
>>> (SNAPSHOT 3.0.0). I can not update to current trunk version, as prototype
>>> deadline is soon, and I don't have time to include Yarn API changes.****
>>>
>>> Currently I execute experiments in pseudo-distributed mode, I use guava
>>> version 14.0-rc1. I have a problem with Yarn's and HDFS Exceptions for
>>> "larger" datasets. My AM works fine and I can execute it without a problem
>>> for a debug dataset (1MB size). But when I increase the size of input to
>>> 6.8 MB, I am getting the following exceptions:****
>>>
>>> AM_Exceptions_Stack
>>>
>>> Exception in thread "Thread-3"
>>> java.lang.reflect.UndeclaredThrowableException
>>>     at
>>> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>>>     at
>>> org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:77)
>>>     at
>>> org.apache.hadoop.yarn.client.AMRMClientImpl.allocate(AMRMClientImpl.java:194)
>>>     at
>>> org.tudelft.ludograph.app.AppMasterContainerRequester.sendContainerAskToRM(AppMasterContainerRequester.java:219)
>>>     at
>>> org.tudelft.ludograph.app.AppMasterContainerRequester.run(AppMasterContainerRequester.java:315)
>>>     at java.lang.Thread.run(Thread.java:662)
>>> Caused by: com.google.protobuf.ServiceException: java.io.IOException:
>>> Failed on local exception: java.io.IOException: Response is null.; Host
>>> Details : local host is: "linux-ljc5.site/127.0.0.1"; destination host
>>> is: "0.0.0.0":8030;
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212)
>>>     at $Proxy10.allocate(Unknown Source)
>>>     at
>>> org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.allocate(AMRMProtocolPBClientImpl.java:75)
>>>     ... 4 more
>>> Caused by: java.io.IOException: Failed on local exception:
>>> java.io.IOException: Response is null.; Host Details : local host is:
>>> "linux-ljc5.site/127.0.0.1"; destination host is: "0.0.0.0":8030;
>>>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:760)
>>>     at org.apache.hadoop.ipc.Client.call(Client.java:1240)
>>>     at
>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>>>     ... 6 more
>>> Caused by: java.io.IOException: Response is null.
>>>     at
>>> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:950)
>>>     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)****
>>>
>>> Container_Exception
>>>
>>> Exception in thread "org.apache.hadoop.hdfs.SocketCache@6da0d866"
>>> java.lang.NoSuchMethodError:
>>> com.google.common.collect.LinkedListMultimap.values()Ljava/util/List;
>>>     at org.apache.hadoop.hdfs.SocketCache.clear(SocketCache.java:257)
>>>     at org.apache.hadoop.hdfs.SocketCache.access$100(SocketCache.java:45)
>>>     at org.apache.hadoop.hdfs.SocketCache$1.run(SocketCache.java:126)
>>>     at java.lang.Thread.run(Thread.java:662)
>>>
>>> ****
>>>
>>> As I said this problem does not occur for the 1MB input. For the 6MB
>>> input nothing is changed except the input dataset. Now a little bit of what
>>> am I doing, to give you the context of the problem. My AM starts N (debug
>>> 4) containers and each container reads its input data part. When this
>>> process is finished I am exchanging parts of input between containers
>>> (exchanging IDs of input structures, to provide means for communication
>>> between data structures). During the process of exchanging IDs these
>>> exceptions occur. I start Netty Server/Client on each container and I use
>>> ports 12000-12099 as mean of communicating these IDs. ****
>>>
>>> Any help will be greatly appreciated. Sorry for any typos and if the
>>> explanation is not clear just ask for any details you are interested in.
>>> Currently it is after 2 AM I hope this will be a valid excuse.****
>>>
>>> regards****
>>>
>>> tmp****
>>>
>>
>>
>

Mime
View raw message