hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12996) DataNode Replica Trash
Date Mon, 08 Jan 2018 23:34:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317342#comment-16317342

Rushabh S Shah commented on HDFS-12996:

[~hanishakoneru]: Looks like a good improvement. Thanks for the design.
I skimmed through the design document. I have couple of questions.
1. Suppose user1 and user2 deleted some of their directories (lets say dir1 and dir2 respectively).
If user1 wants to recover its directory, then we will recover dir2 as well ?
2. Another scenario I am concerned about:
Many of our clients(lets say user1) use {{/tmp/<userId>}} to store their intermediate
task output (to work around quota problems).
After a job completes, they delete this space and use the same location to store next job
In the meantime if some other user(lets say user2) wants to recover their mistakenly deleted
directory then we will go back in time for user1 which might corrupt user1's output directory.

Also the design looks very similar to Checkpointing/Snapshots.

> DataNode Replica Trash
> ----------------------
>                 Key: HDFS-12996
>                 URL: https://issues.apache.org/jira/browse/HDFS-12996
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>         Attachments: DataNode_Replica_Trash_Design_Doc.pdf
> DataNode Replica Trash will allow administrators to recover from a recent delete request
that resulted in catastrophic loss of user data. This is achieved by placing all invalidated
blocks in a replica trash on the datanode before completely purging them from the system.
The design doc is attached here.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message