hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mukul Kumar Singh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11786) Add support to make copyFromLocal multi threaded
Date Sun, 02 Jul 2017 05:56:01 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mukul Kumar Singh updated HDFS-11786:
    Attachment: HDFS-11786.005.patch

Thanks for the review [~anu], Last patch should fix the check style warnings as well.

Here is how the new help for the command will look like
HW13605:multi_thread_upload msingh$ hadoop-dist/target/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs
dfs -help copyFromLocal
-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>
  Copy files from the local file system into fs. Copying fails if the file already
  exists, unless the -f flag is given.
  -p                 Preserves access and modification times, ownership and the  
  -f                 Overwrites the destination if it already exists.            
  -t <thread count>  Number of threads to be used, default is 1.                 
  -l                 Allow DataNode to lazily persist the file to disk. Forces   
                     replication factor of 1. This flag will result in reduced   
                     durability. Use with care.                                  
  -d                 Skip creation of temporary file(<dst>._COPYING_).       

> Add support to make copyFromLocal multi threaded
> ------------------------------------------------
>                 Key: HDFS-11786
>                 URL: https://issues.apache.org/jira/browse/HDFS-11786
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Mukul Kumar Singh
>            Assignee: Mukul Kumar Singh
>         Attachments: HDFS-11786.001.patch, HDFS-11786.002.patch, HDFS-11786.003.patch,
HDFS-11786.004.patch, HDFS-11786.005.patch
> CopyFromLocal/Put is not currently multithreaded.
> In case, where there are multiple files which need to be uploaded to the hdfs, a single
thread reads the file and then copies the data to the cluster.
> This copy to hdfs can be made faster by uploading multiple files in parallel.
> I am attaching the initial patch so that I can get some initial feedback.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message