hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Mitic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11959) WASB should configure client side socket timeout in storage client blob request options
Date Thu, 28 May 2015 19:19:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563513#comment-14563513
] 

Ivan Mitic commented on HADOOP-11959:
-------------------------------------

Thanks Chris for reviewing!

bq. Now that the new SDK version has fixed the bug, do we need to remove this code too, or
is this part of the fix permanent?
Good question. Blob metadata properties are not encoded by the client library, I asked the
same question back when we discussed encoding issue with the client sdk team.

> WASB should configure client side socket timeout in storage client blob request options
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11959
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11959
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-11959.2.patch, HADOOP-11959.patch
>
>
> On clusters/jobs where {{mapred.task.timeout}} is set to a larger value, we noticed that
tasks can sometimes get stuck on the below stack.
> {code}
> Thread 1: (state = IN_NATIVE)
> - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int)
@bci=0 (Interpreted frame)
> - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Interpreted
frame)
> - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Interpreted frame)
> - java.io.BufferedInputStream.fill() @bci=175, line=235 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=44, line=275 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - sun.net.www.MeteredStream.read(byte[], int, int) @bci=16, line=134 (Interpreted frame)
> - java.io.FilterInputStream.read(byte[], int, int) @bci=7, line=133 (Interpreted frame)
> - sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(byte[], int, int)
@bci=4, line=3053 (Interpreted frame)
> - com.microsoft.azure.storage.core.NetworkInputStream.read(byte[], int, int) @bci=7,
line=49 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
com.microsoft.azure.storage.blob.CloudBlob, com.microsoft.azure
> .storage.blob.CloudBlobClient, com.microsoft.azure.storage.OperationContext, java.lang.Integer)
@bci=204, line=1691 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob$10.postProcessResponse(java.net.HttpURLConnection,
java.lang.Object, java.lang.Object, com.microsoft.azure.storage
> .OperationContext, java.lang.Object) @bci=17, line=1613 (Interpreted frame)
> - com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(java.lang.Object,
java.lang.Object, com.microsoft.azure.storage.core.StorageRequest, com.mi
> crosoft.azure.storage.RetryPolicyFactory, com.microsoft.azure.storage.OperationContext)
@bci=352, line=148 (Interpreted frame)
> - com.microsoft.azure.storage.blob.CloudBlob.downloadRangeInternal(long, java.lang.Long,
byte[], int, com.microsoft.azure.storage.AccessCondition, com.microsof
> t.azure.storage.blob.BlobRequestOptions, com.microsoft.azure.storage.OperationContext)
@bci=131, line=1468 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.dispatchRead(int) @bci=31, line=255
(Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.readInternal(byte[], int, int) @bci=52,
line=448 (Interpreted frame)
> - com.microsoft.azure.storage.blob.BlobInputStream.read(byte[], int, int) @bci=28, line=420
(Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - java.io.DataInputStream.read(byte[], int, int) @bci=7, line=149 (Interpreted frame)
> - org.apache.hadoop.fs.azure.NativeAzureFileSystem$NativeAzureFsInputStream.read(byte[],
int, int) @bci=10, line=734 (Interpreted frame)
> - java.io.BufferedInputStream.read1(byte[], int, int) @bci=39, line=273 (Interpreted
frame)
> - java.io.BufferedInputStream.read(byte[], int, int) @bci=49, line=334 (Interpreted frame)
> - java.io.DataInputStream.read(byte[]) @bci=8, line=100 (Interpreted frame)
> - org.apache.hadoop.util.LineReader.fillBuffer(java.io.InputStream, byte[], boolean)
@bci=2, line=180 (Interpreted frame)
> - org.apache.hadoop.util.LineReader.readDefaultLine(org.apache.hadoop.io.Text, int, int)
@bci=64, line=216 (Compiled frame)
> - org.apache.hadoop.util.LineReader.readLine(org.apache.hadoop.io.Text, int, int) @bci=19,
line=174 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue() @bci=108, line=185
(Interpreted frame)
> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=13, line=553
(Interpreted frame)
> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=80 (Interpreted
frame)
> - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=91
(Interpreted frame)
> - org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
@bci=6, line=144 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
org.apache.hadoop.
> mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=228, line=784
(Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol)
@bci=148, line=341 (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild$2.run() @bci=29, line=163 (Interpreted frame)
> - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
java.security.AccessControlContext) @bci=0 (Interpreted frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction)
@bci=42, line=415 (Interpreted frame)
> - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
@bci=14, line=1628 (Interpreted frame)
> - org.apache.hadoop.mapred.YarnChild.main(java.lang.String[]) @bci=514, line=158 (Interpreted
frame)
> {code}
> The issue is that the storage client is by default not setting the socket timeout on
its HTTP connections causing that in some (rare) circumstances we encounter a deadlock (e.g.
whether the server on the other side just dies unexpectedly).  
> The fix is to configure the maximum operation time on the storage client request options.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message