hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
Date Wed, 09 Nov 2011 12:53:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147008#comment-13147008
] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------

bq. The close call shouldn't really be required with the idle time set to 0.
My idea was to actually remove the maxIdleTime setting once the root issue HADOOP-7317 is
fixed. I'll let it be.
bq. Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in all the service
client impls? It's a PB specific factory, so putting it here should be ok.
No, that isn't possible. We need access to the proxy object in each impl. Bane of multiple
layering in this part of the code.
bq.Otherwise - the Exception in stopClient() should not be ignored.
Sure, I'll throw exception so that it is clear if somebody calles stopClient() for a protocol
that doesn't implement it.
bq. The client cache (removed by the patch) in ContainerLauncherImpl would still be useful
in non-secure mode. This works for both though - so isn't high priority. Maybe a separate
jira.
Sure, but helps to have the same implementation. Separate JIRA if someone needs it.
bq. Forgot to mention - nice clean workaround to the rpc stop not working Thought it'd be
way more involved.
Yeah, been running with this workaround since nearly a week but didn't put that in the patch
in the hope of fixing the root cause. Turns out that is the only short term solution, alas.

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory
and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local
exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is:
"gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450;

>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't
set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189";
destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message