hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11218) Add option to skip open files during HDFS Snapshots
Date Fri, 17 Feb 2017 19:07:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872327#comment-15872327

Manoj Govindassamy commented on HDFS-11218:

I am currently working on HDFS-11402 - HDFS Snapshots should capture point-in-time copies
of OPEN files - which can also help alleviate problems around HDFS Snapshots and open files.
Please take a look.

> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>                 Key: HDFS-11218
>                 URL: https://issues.apache.org/jira/browse/HDFS-11218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
> *Problem:* 
> When there are files being written and when HDFS Snapshots are taken in parallel,  Snapshots
do capture all these files, but these being written files in Snapshots do not have the point-in-time
file length captured.
> At the time of File close or any other meta data modification operation on that file
which was previously open, HDFS reconciles the file length and records the modification in
the last taken Snapshot. All the previously taken Snapshots continue to have the same open
File with no modification recorded. So, all those previous snapshots end up using the final
modification record in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M is the
number file modifications. So, it would very expensive to record modifications for all the
open files in all the snapshots. For applications that do not want to capture incomplete /
partial being written binary files in the snapshots, it would be preferable to have an extra
option to skip open files. This way, they don't have to worry about restoring inconsistent
files from the snapshots. 
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message