hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14996) wasb: ReadFully occasionally fails when using page blobs
Date Tue, 31 Oct 2017 11:08:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226618#comment-16226618
] 

Steve Loughran commented on HADOOP-14996:
-----------------------------------------

In theory, input streams aren't thread safe (that's at the java.io.InputStream spec level),
so nothing should be using them in parallel.

But some apps are known to treat them as thread safe, because HDFS DFSInputStream did offer
stronger guarantees and apps (HBase) coded against it. {{SequenceFile$Reader}} doesn't make
that assumption, not AFAIK. The input stream it is reading is one created a few lines earlier
in the {{initialize()}} clause...it's now lining up to talk to the start of the data

{code}
    if (fileSplit.getStart() > in.getPosition()) {
      in.sync(fileSplit.getStart());                  // sync to start
    }
{code}

Where it skips 4 bytes and tries to read a 16 byte header
{code}
        seek(position+4);
        in.readFully(syncCheck);   // HERE
{code}

If it's failing in multi threaded IO, I wouldn't put the blame on readFully. It's throwing
an EOFException if there aren't enough bytes to fill up the buffer. Which implies either the
offset the stream has synced to  is > the end of the blob, or the start of the readFully
was within the stream len, but the total length wasn't.  Possible causes for these: partitioning
is wrong, or the measured blob len coming back from getFileStatus is < that of the actual
len.

I wouldn't directly point the blame at the page blob code here, unless there's something up
with how it is doing its seeks. But if its not that code: why isn't it surfacing elsewhere?

> wasb: ReadFully occasionally fails when using page blobs
> --------------------------------------------------------
>
>                 Key: HADOOP-14996
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14996
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>
> Looks like there is a functional bug or concurrency bug in the PageBlobInputStream implemenation
of ReadFully.
> 1) Use 1 mapper to copy results in success:
> hadoop distcp -m 1 wasb://hbt-lifetime@salsbx01sparkdata.blob.core.windows.net/hive_tablesĀ 
wasb://hbt-lifetime-bkp@supporttestl2.blob.core.windows.net/hdi_backup
> 2) Turn on DEBUG log by setting mapreduce.map.log.level=DEBUG in ambari. Then run with
more than 1 mapper:
> Saw debug log like this:
> {code}
> 2017-10-27 06:18:53,545 DEBUG [main] org.apache.hadoop.fs.azure.NativeAzureFileSystem:
Seek to position 136251. Bytes skipped 136210
> 2017-10-27 06:18:53,549 DEBUG [main] org.apache.hadoop.fs.azure.AzureNativeFileSystemStore:
Closing page blob output stream.
> 2017-10-27 06:18:53,549 DEBUG [main] org.apache.hadoop.fs.azure.AzureNativeFileSystemStore:
java.util.concurrent.ThreadPoolExecutor@73dce0e6[Terminated, pool size = 0, active threads
= 0, queued tasks = 0, completed tasks = 0]
> 2017-10-27 06:18:53,549 DEBUG [main] org.apache.hadoop.security.UserGroupInformation:
PrivilegedActionException as:mssupport (auth:SIMPLE) cause:java.io.EOFException
> 2017-10-27 06:18:53,553 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at org.apache.hadoop.io.SequenceFile$Reader.sync(SequenceFile.java:2693)
> at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:58)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message