hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1143) Implement Background deletion
Date Fri, 14 May 2010 20:30:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867658#action_12867658

Scott Chen commented on HDFS-1143:

Hey Koji,

HDFS-173 help other clients because it releases the lock when removing blocks from time to
time. It is very nice.
I think there is still room to improve.
1. collectSubtreeBlocksAndClear() is called in side the global lock. This is not necessary
because once we called removeChild() the subtree is not referenced by outside. It is OK to
do it without lock. Avoiding holding this lock improves the efficiency.
2. The client who performs the deletion do not have to wait the blocks to be empty. Once the
node is removed from the iNode tree and file lease is cleared. The deletion should be considered
finished. The rest can be moved to the background. This way the client who dose deletion will
get better response.

I think what Dhruba says make sense. To be more specific, we can
1. Do removeChild(), do removeLeaseWithPrefixPath() and just launch the background cleanup
2. In the background task,
    a. Do collectSubtreeBlocksAndClear() without any lock
    b. Hold the global lock and delete blocks in small batches to avoid holding the lock too

I think the bottom line is that we should just leave the atomic operations in the lock and
move everything else in background.

> Implement Background deletion
> -----------------------------
>                 Key: HDFS-1143
>                 URL: https://issues.apache.org/jira/browse/HDFS-1143
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Dmytro Molkov
>            Assignee: Scott Chen
>             Fix For: 0.22.0
> Right now if you try to delete massive number of files from the namenode it will freeze
(sometimes for minutes). Most of the time is spent going through the blocks map and invalidating
all the blocks.
> This can probably be improved by having a background GC process. The deletion will basically
just remove the inode being deleted and then give the subtree that was just deleted to the
background thread running cleanup.
> This way the namenode becomes available for the clients soon after deletion, and all
the heavy operations are done in the background.
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message