hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask
Date Thu, 15 Jun 2017 06:04:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sankar Hariappan updated HIVE-16901:
------------------------------------
    Description: 
Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked for
each and every file. Instead, need to pass the list of source files to be copied to distcp
tool which basically copies the files in parallel and hence gets lot of performance gain.

If the copy of list of files fail, then traverse the destination directory to see which file
is missing and checksum mismatches, then trigger copy of those files one by one.

  was:
Currently, if a CopyTask is created to copy a list of files, then distcp is invoked for each
and every file. Instead, need to pass the list of source files to be copied to distcp tool
which basically copies the files in parallel and hence gets lot of performance gain.

If the copy of list of files fail, then traverse the destination directory to see which file
is missing and checksum mismatches, then trigger copy of those files one by one.


> Distcp optimization - One distcp per ReplCopyTask 
> --------------------------------------------------
>
>                 Key: HIVE-16901
>                 URL: https://issues.apache.org/jira/browse/HIVE-16901
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive, repl
>    Affects Versions: 2.1.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, replication
>             Fix For: 3.0.0
>
>
> Currently, if a ReplCopyTask is created to copy a list of files, then distcp is invoked
for each and every file. Instead, need to pass the list of source files to be copied to distcp
tool which basically copies the files in parallel and hence gets lot of performance gain.
> If the copy of list of files fail, then traverse the destination directory to see which
file is missing and checksum mismatches, then trigger copy of those files one by one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message