hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-994) DFS Scalability : a BlockReport that returns large number of blocks-to-be-deleted cause datanode to lost connectivity to namenode
Date Fri, 02 Mar 2007 22:04:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doug Cutting updated HADOOP-994:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.12.0
           Status: Resolved  (was: Patch Available)

I committed this.  Thanks, Dhruba!

> DFS Scalability : a BlockReport that returns large number of blocks-to-be-deleted cause
datanode to lost connectivity to namenode
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-994
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.12.0
>
>         Attachments: blockReportInvalidateBlock.patch
>
>
> The Datanode periodically invokes a block report RPC to the Namenode. This RPC returns
the number of blocks that are to be invalidated by the Datanode. The Datanode then starts
to delete all the corresponding files. This block deletion is done by the heartbeat thread
in the Datanode. If the number of files to be deleted is large, the Datanode stops sending
heartbeats for this entire duration. The Namenode declares the Datanode as "dead" and starts
replicating its blocks.
> In my observed case, the block report returns 1669 blocks that were to be invalidated.
The Datanode was running on a RAID5 ext3 filesystem and 4 active tasks were running on it.
The deletion of  these 1669 files took about 30 minutes, Wow! The average disk service time
during this period was less than 10 ms. The Datanode was using about 30% CPU during this time.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message