hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14535) Support for random access and seek of block blobs
Date Mon, 10 Jul 2017 20:58:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081133#comment-16081133

Steve Loughran commented on HADOOP-14535:

Patch 006. This is patch 005 with all the changes I suggested, particularly the tests.

The original test suite has a couple of operational flaws
# its slow
#  it leaves 128MB files around. This can be expensive.

I've reworked it to use the same style as {{AbstractSTestS3AHugeFiles}}; using ordered names
to guarantee the test cases are run in sequence; the final test deletes the file. And downsized
the file. 
This is lined up for HADOOP-14553, which ports a copy of the same test into Azure, and runs
tests in parallel. The tests in this method should be something which can be merged in to
that test, and make it a {{scale}} test for configurable size of dataset.

Tested: new suite, yes. Remainder: in progress

Running org.apache.hadoop.fs.azure.TestBlockBlobInputStream
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 212.423 sec - in org.apache.hadoop.fs.azure.TestBlockBlobInputStream

Results :

Tests run: 19, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:37 min (Wall Clock)
[INFO] Finished at: 2017-07-10T21:46:59+01:00
[INFO] Final Memory: 46M/820M
[INFO] ------------------------------------------------------------------------

> Support for random access and seek of block blobs
> -------------------------------------------------
>                 Key: HADOOP-14535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14535
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: 0001-Random-access-and-seek-imporvements-to-azure-file-system.patch,
0003-Random-access-and-seek-imporvements-to-azure-file-system.patch, 0004-Random-access-and-seek-imporvements-to-azure-file-system.patch,
0005-Random-access-and-seek-imporvements-to-azure-file-system.patch, HADOOP-14535-006.patch
> This change adds a seek-able stream for reading block blobs to the wasb:// file system.
> If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
> That is, the stream is optimized for sequential reads by reading chunks (over the network)
> the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
> If reverse seek() is used, the behavior of read() changes in favor of reading the actual
> of bytes requested in the call to read(), with some constraints.  If the size requested
is smaller
> than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will
be 16
> kilobytes.  If the size requested is greater than 4 megabytes, it will be satisfied by
> 4 megabyte reads over the network.
> This change improves the performance of FSInputStream.seek() by not closing and re-opening
> stream, which for block blobs also involves a network operation to read the blob metadata.
> NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read
> [^attachment-name.zip]

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message