hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.
Date Wed, 07 Mar 2018 14:26:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16389616#comment-16389616
] 

Steve Loughran commented on HADOOP-15292:
-----------------------------------------

# I like the extra instrumentation & probes; if it works for HDFS it'll be the same everywhere
# I think chris's comment about {{sourceOffset != inStream.getPos()}} seems valid. If the
file is newly opened, this is the same as offset!=0, otherwise its relative to where you are.

w.r.t S3 testing, I can see why it wouldn't be your default, but our test suites are designed
to be very low cost (no persistent data, bias to uploads and large D/Ls all from AWS funded
buckets). It's worth getting set up for this to help verify consistent behaviour everywhere.


At the very least, make sure the Azure WASB store tests are happy. (you don't get an ADL test
until HADOOP-15209). 

> Distcp's use of pread is slowing it down.
> -----------------------------------------
>
>                 Key: HADOOP-15292
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15292
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 2.5.0
>            Reporter: Virajith Jalaparti
>            Priority: Minor
>         Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
>
>
> Distcp currently uses positioned-reads (in RetriableFileCopyCommand#copyBytes) when the
source offset is > 0. This results in unnecessary overheads (new BlockReader being created
on the client-side, multiple readBlock() calls to the Datanodes, each of which requires the
creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message