hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13286) add a scale test to do gunzip and linecount
Date Mon, 20 Jun 2016 13:25:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339477#comment-15339477
] 

Steve Loughran commented on HADOOP-13286:
-----------------------------------------

In a test against s3 ireland, opening the file with the sequential policy,  9.6s to read
{code}
Running org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.537 sec - in org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
{code}

The closest equivalent test is {{testTimeToOpenAndReadWholeFileByByte}}, which, interestingly,
takes slightly longer, at least for me. (disclaimer, this is 
{code}
Running org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.329 sec - in org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance
{code}

given decompress+line-by-line is one we see in real code, I'd actually like to keep it and
cut the {{testTimeToOpenAndReadWholeFileByByte}}, test

> add a scale test to do gunzip and linecount
> -------------------------------------------
>
>                 Key: HADOOP-13286
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13286
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13286-branch-2-001.patch
>
>
> the HADOOP-13203 patch proposal showed that there were performance problems downstream
which weren't surfacing in the current scale tests.
> Trying to decompress the .gz test file and then go through it with LineReader models
a basic use case: parse a .csv.gz data source. 
> Add this, with metric printing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message