hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6840) Distcp to support cutoff time
Date Mon, 27 Mar 2017 13:00:43 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943218#comment-15943218
] 

Steve Loughran commented on MAPREDUCE-6840:
-------------------------------------------

milis is a pretty human-unfriendly number; the mechanism Configuration uses to support ms,
s, m, h, d is better. I think it should be possible to use {{configuration.getTimeDuration()}}
to parse the duration arg simply by creating a no-default Configuration, set the property,
then have it parse the string. Ugly but effective

> Distcp to support cutoff time
> -----------------------------
>
>                 Key: MAPREDUCE-6840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: MAPREDUCE-6840.1.patch
>
>
> To ensure consistency in the datasets on HDFS,  some projects like file formats on Hive
do HDFS operations in a particular order.  For example, if a file format uses an index file,
a new version of the index file will only be written to HDFS after all files mentioned by
the index are written to HDFS.
> When we do distcp, it's important to preserve that consistency, so that we don't break
those file formats.
> A typical solution for that is to create a HDFS Snapshot beforehand, and only distcp
the Snapshot.  That could work well if the user has superuser privilege to make the directory
snapshottable.
> If not, then it will be beneficial to have a cutoff time for distcp, so that distcp only
copy files modified on/before that cutoff time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message