hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15027) Improvements for Hadoop read from AliyunOSS
Date Mon, 13 Nov 2017 12:53:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249523#comment-16249523

Steve Loughran commented on HADOOP-15027:


have a look at [Dancing with Elephants|https://www.slideshare.net/steve_l/dancing-elephants-working-with-object-storage-in-apache-spark-and-hive],
which includes some of the summary & trace data from our benchmarking TCP-DS against S3A.
Key perf killers: cost of TCP abort; performance of listing and getFileStatus() calls. the
latter is always done during the sequential partitioning process, so slows the entire job

Common code sequences

often code does open + seek() immediately, even when doing forward reads (example: partitioned
read of a sequential file). Lazy seek code usually delays the GET request until the first
read after a seek. 

Sequential file formats, including output of Mappers
# open(file)
# seek(offset)
# readFully(bytes[], len, offset)

While the columnar stores read footers then bounce around the file, using the PositionedReadable
API. (whose default implementation of seek/read/seek) is a killer unless you do lazy seek

# open(file)
# PositionedReadable.readFully(EOF-offset, bytes[], len, offset)
# PositionedReadable.readFully(offset determined by footer info, bytes[], len, offset)
# PositionedReadable.readFully(offset + 10s or 100s of KB)
# repeated until done

so: backwards as well as forwards, big leaps through the code. (This is isn't a real trace
BTW; we should really collect some).

[~uncleGen] wrote

bq. 2. move thread pool from InputStream to FileSystem: can this lead to blocking between
multiple input stream? I mean if threads are working on slow stuff for one abnormal file ,
other input streams should be waiting for a long time.
3. expensive cost to use thread pool.

Depends on the #of threads a specific stream can have allocated.

Look at {{org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor}}; It's used in S3ABlockOutputStream
to allow >1 thread per stream to upload data, using a fail Semaphore so allocate threads
from the pool fairly across all streams. Then, when the pool is used up, it blocks the caller,
so any thread generating too much data is the one which gets blocked.


* eliminates thread creation overhead on stream creation. (expensive on mempry; slow).
* offers streams ability to use >1 thread for IO.
* shared pool fairly across threads.
* blocks callers under heavy load (e.g. stops pool expanding until OOM).

If you think it is good we could move that class to hadoop-common and share.

> Improvements for Hadoop read from AliyunOSS
> -------------------------------------------
>                 Key: HADOOP-15027
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15027
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/oss
>    Affects Versions: 3.0.0
>            Reporter: wujinhu
>            Assignee: wujinhu
>         Attachments: HADOOP-15027.001.patch, HADOOP-15027.002.patch, HADOOP-15027.003.patch
> Currently, read performance is poor when Hadoop reads from AliyunOSS. It needs about
1min to read 1GB from OSS.
> Class AliyunOSSInputStream uses single thread to read data from AliyunOSS,  so we can
refactor this by using multi-thread pre read to improve this.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message