hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Genmao Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
Date Mon, 13 Nov 2017 04:31:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249093#comment-16249093

Genmao Yu commented on HADOOP-15027:

[~wujinhu] Thanks for the work. Some comments follow:

1. I have doubt about the necessity of concurrent reading in one input stream. IMHO, it is
indeed helpful to accelerate reading data. However in distributed scene, we have split job
into multiple tasks, i.e. we have read data in parallel in task level. In Aliyun environment,
as far as I know one OSS client may achieve tens of MB per second. 
2. {{move thread pool from InputStream to FileSystem}}: can this lead to blocking between
multiple input stream? I mean if threads are working on slow stuff for one abnormal file (?),
other input streams should be waiting for a long time.
3.  expensive cost to use thread pool.

So my concern is the necessity to use concurrency mechanism in one input stream.

4. some code style warning: 80 line length limit.
5. possible reuse {{CacheItem}} ?

> Improvements for Hadoop read from AliyunOSS
> -------------------------------------------
>                 Key: HADOOP-15027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15027
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>         Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It needs about
1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  so we can
refactor this by using multi-thread pre read to improve this.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message