hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-14535) Support for random access and seek of block blobs
Date Sat, 17 Jun 2017 00:17:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas updated HADOOP-14535:
----------------------------
    Description: 
This change adds a seek-able stream for reading block blobs to the wasb:// file system.

If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
That is, the stream is optimized for sequential reads by reading chunks (over the network)
in
the size specified by "fs.azure.read.request.size" (default is 4 megabytes).

If reverse seek() is used, the behavior of read() changes in favor of reading the actual number
of bytes requested in the call to read(), with some constraints.  If the size requested is
smaller
than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will be
16
kilobytes.  If the size requested is greater than 4 megabytes, it will be satisfied by sequential
4 megabyte reads over the network.

This change improves the performance of FSInputStream.seek() by not closing and re-opening
the
stream, which for block blobs also involves a network operation to read the blob metadata.
Now
NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read position.

[^attachment-name.zip]

  was:
This change adds a seek-able stream for reading block blobs to the wasb:// file system.

If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
That is, the stream is optimized for sequential reads by reading chunks (over the network)
in
the size specified by "fs.azure.read.request.size" (default is 4 megabytes).

If reverse seek() is used, the behavior of read() changes in favor of reading the actual number
of bytes requested in the call to read(), with some constraints.  If the size requested is
smaller
than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will be
16
kilobytes.  If the size requested is greater than 4 megabytes, it will be satisifed by sequential
4 megabyte reads over the network.

This change improves the performance of FSInputStream.seek() by not closing and re-opening
the
stream, which for block blobs also involves a network operation to read the blob metadata.
Now
NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read position.

[^attachment-name.zip]


> Support for random access and seek of block blobs
> -------------------------------------------------
>
>                 Key: HADOOP-14535
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14535
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>             Fix For: 2.9.0, 3.0.0-alpha4
>
>         Attachments: 0001-Random-access-and-seek-imporvements-to-azure-file-system.patch
>
>
> This change adds a seek-able stream for reading block blobs to the wasb:// file system.
> If seek() is not used or if only forward seek() is used, the behavior of read() is unchanged.
> That is, the stream is optimized for sequential reads by reading chunks (over the network)
in
> the size specified by "fs.azure.read.request.size" (default is 4 megabytes).
> If reverse seek() is used, the behavior of read() changes in favor of reading the actual
number
> of bytes requested in the call to read(), with some constraints.  If the size requested
is smaller
> than 16 kilobytes and cannot be satisfied by the internal buffer, the network read will
be 16
> kilobytes.  If the size requested is greater than 4 megabytes, it will be satisfied by
sequential
> 4 megabyte reads over the network.
> This change improves the performance of FSInputStream.seek() by not closing and re-opening
the
> stream, which for block blobs also involves a network operation to read the blob metadata.
Now
> NativeAzureFsInputStream.seek() checks if the stream is seek-able and moves the read
position.
> [^attachment-name.zip]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message