hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-11218) Add option to skip open files during HDFS Snapshots
Date Fri, 25 Aug 2017 21:00:05 GMT

     [ https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Manoj Govindassamy updated HDFS-11218:
--------------------------------------
    Fix Version/s: 3.0.0-beta1

> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>
>                 Key: HDFS-11218
>                 URL: https://issues.apache.org/jira/browse/HDFS-11218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>             Fix For: 2.9.0, 3.0.0-beta1
>
>
> *Problem:* 
> When there are files being written and when HDFS Snapshots are taken in parallel,  Snapshots
do capture all these files, but these being written files in Snapshots do not have the point-in-time
file length captured.
> At the time of File close or any other meta data modification operation on that file
which was previously open, HDFS reconciles the file length and records the modification in
the last taken Snapshot. All the previously taken Snapshots continue to have the same open
File with no modification recorded. So, all those previous snapshots end up using the final
modification record in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M is the
number file modifications. So, it would very expensive to record modifications for all the
open files in all the snapshots. For applications that do not want to capture incomplete /
partial being written binary files in the snapshots, it would be preferable to have an extra
option to skip open files. This way, they don't have to worry about restoring inconsistent
files from the snapshots. 
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message