hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Foley <ma...@yahoo-inc.com>
Subject Re: Question regarding datanode been wiped by hadoop
Date Tue, 12 Apr 2011 17:09:01 GMT
Here's another hypothesis.  You'll have to check your namenode logs to see if it is the correct

In a healthy cluster, when one datanode goes down, 10 minutes later the namenode will "notice"
it, and mark all its blocks as under-replicated (in namenode memory).  It will then generate
replication requests to other holders of those blocks, to get back to the required replication
count for each.  In a large cluster, this process can take as little as a few minutes, because
there are many candidate senders and receivers for the copy operations.

When the datanode comes back a whole day later and sends its Initial Block Report, which we'll
assume still has all the blocks it did before, since the disks were not corrupted, then all
those blocks will be marked as OVER-replicated.  The namenode will generate delete requests
for one replica for each of those blocks.

Now, I would have expected those delete requests to be randomly distributed across all nodes
holding replicas of those blocks.  But there is some indication in the code that all the deletes
may go to the new excess source, especially if the number of blocks is small (say, under one
thousand).  I'm not sure whether this is a sufficient explanation or not.

If the block deletes were ordered by the namenode, they may be in the namenode logs, with
a prefix "hdfs.StateChange: BLOCK*", like this:
11/04/12 10:00:28 INFO hdfs.StateChange: BLOCK* NameSystem.chooseExcessReplicates: (,
blk_-44847565016350316_1001) is added to recentInvalidateSets
11/04/12 10:00:28 INFO hdfs.StateChange: BLOCK* ask to delete  blk_-44847565016350316_1001

Obviously you have to be logging at INFO level for these block-level manipulations to be in
your namenode logs.
The datanode then echoes it with log lines like:
11/04/12 10:00:29 INFO datanode.DataNode: Scheduling block blk_-44847565016350316_1001 file
.../current/finalized/blk_-44847565016350316 for deletion
11/04/12 10:00:29 INFO datanode.DataNode: Deleted block blk_-44847565016350316_1001 at file

Your log extract below shows that the datanode was logging at INFO level.  See if your namenode
was too, and if you can show that it generated a bunch of delete requests shortly after your
repaired datanode came up.


On Apr 12, 2011, at 9:30 AM, felix gao wrote:

mapred.local.dir is under /hadoop/mapred  defs.data.dir is /hadoop./dfs. The logs showing

011-04-11 14:34:10,987 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2011-04-11 14:34:10,987 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020:
2011-04-11 14:34:10,988 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020:
2011-04-11 14:34:10,988 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration
= DatanodeRegistration(had41.xxx:50010, storageID=DS-922075132-,
infoPort=50075, ipcPort=50020)
2011-04-11 14:34:10,988 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020:
2011-04-11 14:34:11,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(<>,
storageID=DS-922075132-, infoPort=50075, ipcPort=50020)In
DataNode.run, data = FSDataset{dirpath='/hadoop/dfs/current'}
2011-04-11 14:34:11,021 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL
of 3600000msec Initial delay: 0msec
2011-04-11 14:34:15,545 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 110651 blocks got processed in 4493 msecs
2011-04-11 14:34:15,545 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
block scanner.
2011-04-11 14:34:15,692 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-9213431071914219029_3395500 file /hadoop/dfs/current/subdir7/subdir26/blk_-9213431071914219029

then followed by all the blocks in the dfs/current directory.  Seems to me hadoop just want
to invalidate all the blocks in that box by deleting all of it.

On Tue, Apr 12, 2011 at 9:17 AM, Harsh J <harsh@cloudera.com<mailto:harsh@cloudera.com>>
Apart from ensuring all that others here have said, are your
mapred.local.dir and dfs.data.dir pointing to the same directory by
any chance? If that so happens, the tasktracker could potentially wipe
out the datanode directories when restarted.

On Tue, Apr 12, 2011 at 8:16 PM, felix gao <gre1600@gmail.com<mailto:gre1600@gmail.com>>
> What reason/condition would cause a datanode’s blocks to be removed?   Our
> cluster had a one of its datanodes crash because of bad RAM.   After the
> system was upgraded and the datanode/tasktracker brought online the next day
> we noticed the amount of space utilized was minimal and the cluster was
> rebalancing blocks to the datanode.   It would seem the prior blocks were
> removed.   Was this because the datanode was declared dead?   What is the
> criteria for a namenode to decide (Assuming its the namenode) when a
> datanode should remove prior blocks?

Harsh J

View raw message