hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11959) WASB should configure client side socket timeout in storage client blob request options
Date Fri, 29 May 2015 11:55:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564627#comment-14564627
] 

Hudson commented on HADOOP-11959:
---------------------------------

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/])
HADOOP-11959. WASB should configure client side socket timeout in storage client blob request
options. Contributed by Ivan Mitic. (cnauroth: rev 94e7d49a6dab7e7f4e873dcca67e7fcc98e7e1f8)
* hadoop-common-project/hadoop-common/CHANGES.txt
* hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestAzureFileSystemErrorConditions.java
* hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterfaceImpl.java
* hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/TestBlobDataValidation.java
* hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/AzureNativeFileSystemStore.java
* hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azure/MockStorageInterface.java
* hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/StorageInterface.java
* hadoop-project/pom.xml


> WASB should configure client side socket timeout in storage client blob request options
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11959
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11959
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>             Fix For: 2.8.0
>
>         Attachments: HADOOP-11959.2.patch, HADOOP-11959.patch
>
>
> On clusters/jobs where {{mapred.task.timeout}} is set to a larger value, we noticed that
tasks can sometimes get stuck on the below stack.
> {code}
> Thread 1: (state = IN_NATIVE)
> - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int)
@bci=0 (Interpreted frame)
> - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Interpreted
frame)
> - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Interpreted frame)
> - java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=44, line=275 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - sun.net.www.MeteredStream.read(byte[], int, int) @bci=16, line=134 (Interpreted frame)
> - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Interpreted frame)
> - sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(byte[], int, int)
@bci=4, line=3053 (Interpreted frame)
> - com.microsoft.azure.storage.core.NetworkInputStream.read(byte[], int, int) @bci=7,
line=49 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
com.microsoft.azure.storage.blob.CloudBlob, com.microsoft.azure
> .storage.blob.CloudBlobClient, com.microsoft.azure.storage.OperationContext, java.lang.Integer)
@bci=204, line=1691 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
java.lang.Object, java.lang.Object, com.microsoft.azure.storage
> .OperationContext, java.lang.Object) @bci=17, line=1613 (Interpreted frame)
> - com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(java.lang.Object,
java.lang.Object, com.microsoft.azure.storage.core.StorageRequest, com.mi
> crosoft.azure.storage.RetryPolicyFactory, com.microsoft.azure.storage.OperationContext)
@bci=352, line=148 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(long, java.lang.Long,
byte[], int, com.microsoft.azure.storage.AccessCondition, com.microsof
> t.azure.storage.blob.BlobRequestOptions, com.microsoft.azure.storage.OperationContext)
@bci=131, line=1468 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(int) @bci=31, line=255
(Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.readInternal(byte[], int, int) @bci=52,
line=448 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.read(byte[], int, int) @bci=28, line=420
(Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - java.io.DataInputStream.read(byte[], int, int) @bci=7, line=149 (Interpreted frame)
> - org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(byte[],
int, int) @bci=10, line=734 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - java.io.DataInputStream.read(byte[]) @bci=8, line=100 (Interpreted frame)
> - org.apache.hadoop.util.LineReader.fillBuffer(java.io.InputStream, byte[], boolean)
@bci=2, line=180 (Interpreted frame)
> - org.apache.hadoop.util.LineReader.readDefaultLine(org.apache.hadoop.io.Text, int, int)
@bci=64, line=216 (Compiled frame)
> - org.apache.hadoop.util.LineReader.readLine(org.apache.hadoop.io.Text, int, int) @bci=19,
line=174 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue() @bci=108, line=185
(Interpreted frame)
> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=13, line=553
(Interpreted frame)
> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=80 (Interpreted
frame)
> - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=91
(Interpreted frame)
> - org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
@bci=6, line=144 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
org.apache.hadoop.
> mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=228, line=784
(Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol)
@bci=148, line=341 (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild$2.run() @bci=29, line=163 (Interpreted frame)
> - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
java.security.AccessControlContext) @bci=0 (Interpreted frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction)
@bci=42, line=415 (Interpreted frame)
> - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
@bci=14, line=1628 (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild.main(java.lang.String[]) @bci=514, line=158 (Interpreted
frame)
> {code}
> The issue is that the storage client is by default not setting the socket timeout on
its HTTP connections causing that in some (rare) circumstances we encounter a deadlock (e.g.
whether the server on the other side just dies unexpectedly).  
> The fix is to configure the maximum operation time on the storage client request options.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message