hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive
Date Mon, 23 Oct 2017 15:37:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215310#comment-16215310

Steve Loughran commented on HADOOP-14965:

Without any data on real-world-use, here's how the new adaptive scheme breaks a test because
it cuts the #of stream aborts down from 4 to 1. Note also that the stream stats now include
the enum value of the seek count & the number of changes.

 T E S T S
Running org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance
Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 99.27 sec <<< FAILURE!
- in org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance
testRandomIONormalPolicy(org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance)  Time
elapsed: 5.651 sec  <<< FAILURE!
java.lang.AssertionError: streams aborted in StreamStatistics{OpenOperations=4, CloseOperations=4,
Closed=3, Aborted=1, SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=2,
BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, BytesRead=1376256, BytesRead excluding
skipped=1376256, ReadOperations=161, ReadFullyOperations=4, ReadsIncomplete=157, BytesReadInClose=0,
BytesDiscardedInAbort=43375083, InputPolicy=2, InputPolicySetCount=2} expected:<4> but
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:555)
	at org.apache.hadoop.fs.s3a.scale.ITestS3AInputStreamPerformance.testRandomIONormalPolicy(ITestS3AInputStreamPerformance.java:429)

> s3a input stream "normal" fadvise mode to be adaptive
> -----------------------------------------------------
>                 Key: HADOOP-14965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14965
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
> HADOOP-14535 added seek optimisation to wasb, but rather than require the caller to declare
sequential vs random, it works out for itself.
> # defaults to sequential, lazy seek
> # if the caller ever seeks backwards, switches to random IO.
> This means that on the use pattern of columnar stores: of go to end of file, read summary,
then go to columns and work forwards, will switch to random IO after that first seek back
(cost: one aborted HTTP connection)/.
> Where this should benefit the most is in downstream apps where you are working with different
data sources in the same object store/running of the same app config, but have different read
patterns. I'm seeing exactly this in some of my spark tests, where it's near impossible to
set things up so that .gz files are read sequentially, but ORC data is read in random IO
> I propose the "normal" fadvise => adaptive, sequential==sequential always, random
=> random from the outset.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message