hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jho...@yahoo-inc.com>
Subject Re: HDFS file disappeared
Date Fri, 12 Nov 2010 00:38:06 GMT
 >Would be great if someone wrote some tools that, given a block ID, tracked
 >the life of the file that contained it (including renames of containing
 > dirs, etc). Shouldn't be too difficult.

There's a tool for this in MapRed's contrib section under 
block_forensics.  It was released in 21, I believe.  It hasn't been kept 
up to date though, so I'm not sure how functional it still is.
-Jakob

Todd Lipcon wrote:
> What's the last audit log entry prior to 2010-11-10 21:42:33,389?
> 
> -Todd
> 
> On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch <darose@darose.net 
> <mailto:darose@darose.net>> wrote:
> 
>     Saw a couple more references to that block before the "to delete
>      blk" messages:
> 
>     2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
>     BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
>     added to invalidSet of <our ip prefix>.169:50010
>     2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
>     BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
>     added to invalidSet of <our ip prefix>.173:50010
>     2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange:
>     BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is
>     added to invalidSet of <our ip prefix>.176:50010
> 
>     Again, I'm not sure why this is happening though.
> 
> 
>     BTW, I appreciate your comments below (about it getting moved out of
>     the temp directory and then getting removed in another pass).  But I
>     grepped the logs as you suggested, and I still don't see how it got
>     moved/deleted:
> 
>     [root@hdmaster hadoop-0.20]# grep
>     _attempt_201010221550_0418_r_000001_0
>     hadoop-hadoop-namenode-hdmaster.log.2010-11-10
>     2010-11-10 21:42:28,442 INFO
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>     ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/<our ip
>     root>.176     cmd=create      src=<our root
>     dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>      dst=null        perm=root:supergroup:rw-r--r--
> 
>     2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange:
>     BLOCK* NameSystem.allocateBlock: <our root
>     dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
>     blk_-4237880568969698703_13404582
>     2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
>     Removing lease on  file <our root
>     dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>     from client DFSClient_attempt_201010221550_0418_r_000001_0
>     2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange:
>     DIR* NameSystem.completeFile: file <our root
>     dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>     is closed by DFSClient_attempt_201010221550_0418_r_000001_0
> 
> 
>     I have to say, I'm really quite stumped.  Been looking at this all
>     afternoon, and I still have no idea how/why that file got purged.  :-(
> 
>     Thanks,
> 
>     DR
> 
> 
>     On 11/11/2010 02:13 PM, Todd Lipcon wrote:
> 
>         Given that it's an MR output, my guess is it got moved out of
>         the temporary
>         directory when the job "Committed" and then was removed as
>         another pass. I'd
>         grep for the containing directory name in the audit logs to see
>         where it got
>         moved to and how it was eventually deleted.
> 
>         Would be great if someone wrote some tools that, given a block
>         ID, tracked
>         the life of the file that contained it (including renames of
>         containing
>         dirs, etc). Shouldn't be too difficult.
> 
>         -Todd
> 
>         On Thu, Nov 11, 2010 at 9:38 AM, David
>         Rosenstrauch<darose@darose.net <mailto:darose@darose.net>>wrote:
> 
>             Sorry, I stand corrected.  When I grep the name node logs
>             for the block ID
> 
>             there are some additional lines:
> 
>             [root@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
>             hadoop-hadoop-namenode-hdmaster.log.2010-11-10
> 
>             2010-11-10 21:42:29,802 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK*
>             NameSystem.allocateBlock:<our root
>             dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
>             blk_-4237880568969698703_13404582
>              2010-11-10 21:42:30,355 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK*
>             NameSystem.addStoredBlock: blockMap updated:<our ip
>             prefix>.169:50010 is
>             added to blk_-4237880568969698703_13404582 size 326522
>             2010-11-10 21:42:30,356 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK*
>             NameSystem.addStoredBlock: blockMap updated:<our ip
>             prefix>.173:50010 is
>             added to blk_-4237880568969698703_13404582 size 326522
>             2010-11-10 21:42:30,357 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK*
>             NameSystem.addStoredBlock: blockMap updated:<our ip
>             prefix>.176:50010 is
>             added to blk_-4237880568969698703_13404582 size 326522
>             2010-11-10 21:42:33,585 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK* ask
>             <our ip prefix>.176:50010 to delete
>              blk_-4237880568969698703_13404582
>             2010-11-10 21:42:42,598 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK* ask
>             <our ip prefix>.173:50010 to delete
>              blk_719995703651674050_13393533
>             blk_2844104197159113873_13393516
>             blk_6711472063024831893_13393527
>             blk_240592081553699046_13393054 blk_6933038421638165347_13393232
>             blk_-1288897671244436593_13393351
>             blk_1408176921409258101_13393152
>             blk_-4237880568969698703_13404582
>             blk_-4188858742780299895_13393410
>             blk_-2670318277937301225_13393450
>             blk_-1858397614006984730_13393480
>             blk_6292330466651227068_13393505
>             2010-11-10 21:42:42,598 INFO
>             org.apache.hadoop.hdfs.StateChange: BLOCK* ask
>             <our ip prefix>.169:50010 to delete
>              blk_719995703651674050_13393533
>             blk_6711472063024831893_13393527
>             blk_-514463419649319075_13404582
>             blk_-1980639884885154260_13404582
>             blk_240592081553699046_13393054
>             blk_8520588884031249451_13393428
>             blk_-6620589068796293265_13393089
>             blk_6933038421638165347_13393232
>             blk_-1288897671244436593_13393351
>             blk_-3581082286710707012_13393556
>             blk_1408176921409258101_13393152
>             blk_-1858397614006984730_13393480
>             blk_6292330466651227068_13393505
>             blk_2844104197159113873_13393516
>             blk_-4237880568969698703_13404582
>             blk_-4188858742780299895_13393410
>             blk_-2670318277937301225_13393450
> 
> 
>             I'm still not clear why those deletes are happening though.
>              Will read
>             through the logs again at those locations.  But if anyone's
>             got any ideas,
>             pls chime in.
> 
>             DR
> 
> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Mime
View raw message