hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-11218) Add option to skip open files during HDFS Snapshots
Date Wed, 07 Dec 2016 19:27:58 GMT
Manoj Govindassamy created HDFS-11218:
-----------------------------------------

             Summary: Add option to skip open files during HDFS Snapshots
                 Key: HDFS-11218
                 URL: https://issues.apache.org/jira/browse/HDFS-11218
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: snapshots
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy


Problem: 

When there are files being written and when HDFS Snapshots are taken in parallel,  Snapshots
do capture all these files, but these being written files in Snapshots do not have the point-in-time
file length captured.

At the time of File close or any other meta data modification operation on that file which
was previously open, HDFS reconciles the file length and records the modification in the last
taken Snapshot. All the previously taken Snapshots continue to have the same open File with
no modification recorded. So, all those previous snapshots end up using the final modification
record in the next available snapshot.

Proposal:

HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M is the number
file modifications. So, it would very expensive to record modifications for all the open files
in all the snapshots. For applications that do not want to capture incomplete / partial being
written binary files in the snapshots, it would be preferable to have an extra option to skip
open files. This way, they don't have to worry about restoring inconsistent files from the
snapshots. 

{noformat}
hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message