hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sankar Hariappan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16898) Validation of source file after distcp in repl load
Date Fri, 29 Sep 2017 16:59:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16186070#comment-16186070
] 

Sankar Hariappan edited comment on HIVE-16898 at 9/29/17 4:58 PM:
------------------------------------------------------------------

Added 8.patch with below changes.
- Rebased against master
- Fixed the bugs in handling for FileNotFoundException flow after distCp.
- Some code clean-up.

*Note:* Didn't handle couple of known issues as follows. Will track it from separate JIRA.
- If the source file is changed twice during distCp and leads to same checksum after copy
but actually copied intermediate data.
- If distCp fails with FileNotFoundException, it is assumed that no partially copied file
exist in destination. If it leads to partially copied data, then we always redirect copy from
CM path even if source file exists.

Request [~thejas], [~anishek] to please review the same.
cc [~daijy]


was (Author: sankarh):
Added 8.patch with below changes.
- Rebased against master
- Fixed the bugs in handling for FileNotFoundException flow after distCp.
- Some code clean-up.

*Note: *Didn't handle couple of known issues as follows. Will track it from separate JIRA.
- If the source file is changed twice during distCp and leads to same checksum after copy
but actually copied intermediate data.
- If distCp fails with FileNotFoundException, it is assumed that no partially copied file
exist in destination. If it leads to partially copied data, then we always redirect copy from
CM path even if source file exists.

Request [~thejas], [~anishek] to please review the same.
cc [~daijy]

> Validation of source file after distcp in repl load 
> ----------------------------------------------------
>
>                 Key: HIVE-16898
>                 URL: https://issues.apache.org/jira/browse/HIVE-16898
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.0.0
>            Reporter: anishek
>            Assignee: Sankar Hariappan
>              Labels: pull-request-available
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16898.1.patch, HIVE-16898.2.patch, HIVE-16898.3.patch, HIVE-16898.4.patch,
HIVE-16898.5.patch, HIVE-16898.6.patch, HIVE-16898.7.patch, HIVE-16898.8.patch
>
>
> time between deciding the source and destination path for distcp to invoking of distcp
can have a change of the source file, hence distcp might copy the wrong file to destination,
hence we should an additional check on the checksum of the source file path after distcp finishes
to make sure the path didnot change during the copy process. if it has take additional steps
to delete the previous file on destination and copy the new source and repeat the same process
as above till we copy the correct file. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message