hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rand (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-14062) ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy is enabled
Date Tue, 07 Feb 2017 03:26:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855170#comment-15855170
] 

Steven Rand edited comment on HADOOP-14062 at 2/7/17 3:25 AM:
--------------------------------------------------------------

[~jianhe], the server logs are for a different time range, but correspond to another instance
of the same problem happening. I have never seen any errors or warning in the RM's log when
this problem occurs -- it appears to be entirely client-side. I can reproduce the issue again
and attach AM and RM logs from the same time if it would be helpful, but the contents of the
RM log will be the same as they are in the current attachment.

I will write a unit test, but am also hoping for feedback on whether the approach taken in
the current patch even makes sense. I can't tell whether the problem is that the unwrapped
input stream needs to be wrapped in a {{BufferedInputStream}}, or whether it's that the first
four bytes of the unwrapped input stream are supposed to be the length of the stream but instead
are something else.

EDIT: I forgot to say that yes, I have tested my patch using TestDFSIO and the patch resolves
the issue as far as I can tell.


was (Author: steven rand):
[~jianhe], the server logs are for a different time range, but correspond to another instance
of the same problem happening. I have never seen any errors or warning in the RM's log when
this problem occurs -- it appears to be entirely client-side. I can reproduce the issue again
and attach AM and RM logs from the same time if it would be helpful, but the contents of the
RM log will be the same as they are in the current attachment.

I will write a unit test, but am also hoping for feedback on whether the approach taken in
the current patch even makes sense. I can't tell whether the problem is that the unwrapped
input stream needs to be wrapped in a {{BufferedInputStream}}, or whether it's that the first
four bytes of the unwrapped input stream are supposed to be the length of the stream but instead
are something else.

> ApplicationMasterProtocolPBClientImpl.allocate fails with EOFException when RPC privacy
is enabled
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14062
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14062
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Steven Rand
>            Priority: Critical
>         Attachments: YARN-6013-branch-2.8.0.002.patch, yarn-rm-log.txt
>
>
> When privacy is enabled for RPC (hadoop.rpc.protection = privacy), {{ApplicationMasterProtocolPBClientImpl.allocate}}
sometimes (but not always) fails with an EOFException. I've reproduced this with Spark 2.0.2
built against latest branch-2.8 and with a simple distcp job on latest branch-2.8.
> Steps to reproduce using distcp:
> 1. Set hadoop.rpc.protection equal to privacy
> 2. Write data to HDFS. I did this with Spark as follows: 
> {code}
> sc.parallelize(1 to (5*1024*1024)).map(k => Seq(k, org.apache.commons.lang.RandomStringUtils.random(1024,
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWxyZ0123456789")).mkString("|")).toDF().repartition(100).write.parquet("hdfs:///tmp/testData")
> {code}
> 3. Attempt to distcp that data to another location in HDFS. For example:
> {code}
> hadoop distcp -Dmapreduce.framework.name=yarn hdfs:///tmp/testData hdfs:///tmp/testDataCopy
> {code}
> I observed this error in the ApplicationMaster's syslog:
> {code}
> 2016-12-19 19:13:50,097 INFO [eventHandlingThread] org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
Event Writer setup for JobId: job_1482189777425_0004, File: hdfs://<namenode_host>:8020/tmp/hadoop-yarn/staging/<hdfs_user>/.staging/job_1482189777425_0004/job_1482189777425_0004_1.jhist
> 2016-12-19 19:13:51,004 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
Before Scheduling: PendingReds:0 ScheduledMaps:4 ScheduledReds:0 AssignedMaps:0 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
> 2016-12-19 19:13:51,031 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
getResources() for application_1482189777425_0004: ask=1 release= 0 newContainers=0 finishedContainers=0
resourcelimit=<memory:22528, vCores:23> knownNMs=3
> 2016-12-19 19:13:52,043 INFO [RMCommunicator Allocator] org.apache.hadoop.io.retry.RetryInvocationHandler:
Exception while invoking ApplicationMasterProtocolPBClientImpl.allocate over null. Retrying
after sleeping for 30000ms.
> java.io.EOFException: End of File Exception between local host is: "<application_master_host>/<ip_addr>";
destination host is: "<rm_host>":8030; : java.io.EOFException; For more details see:
 http://wiki.apache.org/hadoop/EOFException
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
> 	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1486)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1428)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1338)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> 	at com.sun.proxy.$Proxy80.allocate(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:398)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:335)
> 	at com.sun.proxy.$Proxy81.allocate(Unknown Source)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:204)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:735)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:392)
> 	at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1785)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1156)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1053)
> {code}
> Marking as "critical" since this blocks YARN users from encrypting RPC in their Hadoop
clusters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message