hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14535) wasb: support for random access and seek of block blobs
Date Tue, 11 Jul 2017 16:15:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082448#comment-16082448
] 

Thomas commented on HADOOP-14535:
---------------------------------

Thanks for moving this forward Steve!  I've provided my comments in response to yours below.
Please let me know if I need to do anything, as it looks like you made the changes that you
requested.

1. I agree that we need to improve how Jenkins runs the azure tests.  Let's clarify the requirements
in HADOOP-14553 and assign it to either myself or Georgi, unless you were planning to take
it on.  On a side note, it takes me ~12 minutes to run all 717 hadoop-azure tests.  My development
environment (Linux virtual machine) and my storage account are in the West US region.  I am
fortunate to have both in the same data center.  You mention that it takes a long time to
run the tests, and I suspect this is due to the network path between your development environment
and the storage account.  Are you using an Azure storage account that is regionally located
near you?

2. BlockBlobInputStream.seek is only called for reverse seek due to the implementation of
NativeAzureFsInputStream.seek.  Since BlockBlobInputStream.seek is never called for a forward
or no-op seek, and there is no good way to exercise such a code path in the unit tests, I
don't think BlockBlobInputStream.seek should be implemented to handle these cases.  Anyhow,
it doesn't matter if you already made the change.

3. TestBlockBlobInputStream intentionally left the 128 MB file to speed up the test run the
next time.  It makes the test run considerably faster, as the 128 MB file is created once.
Earlier, you asked for a permanent shared file for testing, but I don't have a way to do that.
 Creating the file once and re-using it has similar benefits.


> wasb: support for random access and seek of block blobs
> -------------------------------------------------------
>
>                 Key: HADOOP-14535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14535
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: 0001-Random-access-and-seek-imporvements-to-azure-file-system.patch,
0003-Random-access-and-seek-imporvements-to-azure-file-system.patch, 0004-Random-access-and-seek-imporvements-to-azure-file-system.patch,
0005-Random-access-and-seek-imporvements-to-azure-file-system.patch, HADOOP-14535-006.patch
>
>
> This change adds a seek-able stream for reading block blobs to the wasb:// file system.
> If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
> That is, the stream is optimized for sequential reads by reading chunks (over the network)
in
> the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
> If reverse seek() is used, the behavior of read() changes in favor of reading the actual
number
> of bytes requested in the call to read(), with some constraints.  If the size requested
is smaller
> than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will
be 16
> kilobytes.  If the size requested is greater than 4 megabytes, it will be satisfied by
sequential
> 4 megabyte reads over the network.
> This change improves the performance of FSInputStream.seek() by not closing and re-opening
the
> stream, which for block blobs also involves a network operation to read the blob metadata.
Now
> NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read
position.
> [^attachment-name.zip]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message