hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HDFS file disappeared
Date Fri, 12 Nov 2010 00:14:31 GMT
What's the last audit log entry prior to 2010-11-10 21:42:33,389?

-Todd

On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch <darose@darose.net>wrote:

> Saw a couple more references to that block before the "to delete  blk"
> messages:
>
> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
> of <our ip prefix>.169:50010
> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
> of <our ip prefix>.173:50010
> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
> of <our ip prefix>.176:50010
>
> Again, I'm not sure why this is happening though.
>
>
> BTW, I appreciate your comments below (about it getting moved out of the
> temp directory and then getting removed in another pass).  But I grepped the
> logs as you suggested, and I still don't see how it got moved/deleted:
>
> [root@hdmaster hadoop-0.20]# grep _attempt_201010221550_0418_r_000001_0
> hadoop-hadoop-namenode-hdmaster.log.2010-11-10
> 2010-11-10 21:42:28,442 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/<our ip root>.176
> cmd=create      src=<our root
> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>  dst=null        perm=root:supergroup:rw-r--r--
>
> 2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock: <our root
> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
> blk_-4237880568969698703_13404582
> 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: Removing
> lease on  file <our root
> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
> from client DFSClient_attempt_201010221550_0418_r_000001_0
> 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file <our root
> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
> is closed by DFSClient_attempt_201010221550_0418_r_000001_0
>
>
> I have to say, I'm really quite stumped.  Been looking at this all
> afternoon, and I still have no idea how/why that file got purged.  :-(
>
> Thanks,
>
> DR
>
>
> On 11/11/2010 02:13 PM, Todd Lipcon wrote:
>
>> Given that it's an MR output, my guess is it got moved out of the
>> temporary
>> directory when the job "Committed" and then was removed as another pass.
>> I'd
>> grep for the containing directory name in the audit logs to see where it
>> got
>> moved to and how it was eventually deleted.
>>
>> Would be great if someone wrote some tools that, given a block ID, tracked
>> the life of the file that contained it (including renames of containing
>> dirs, etc). Shouldn't be too difficult.
>>
>> -Todd
>>
>> On Thu, Nov 11, 2010 at 9:38 AM, David Rosenstrauch<darose@darose.net
>> >wrote:
>>
>>> Sorry, I stand corrected.  When I grep the name node logs for the block
>>> ID
>>>
>>> there are some additional lines:
>>>
>>> [root@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
>>> hadoop-hadoop-namenode-hdmaster.log.2010-11-10
>>>
>>> 2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.allocateBlock:<our root
>>>
>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
>>> blk_-4237880568969698703_13404582
>>>  2010-11-10 21:42:30,355 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.169:50010 is
>>> added to blk_-4237880568969698703_13404582 size 326522
>>> 2010-11-10 21:42:30,356 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.173:50010 is
>>> added to blk_-4237880568969698703_13404582 size 326522
>>> 2010-11-10 21:42:30,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.176:50010 is
>>> added to blk_-4237880568969698703_13404582 size 326522
>>> 2010-11-10 21:42:33,585 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> ask
>>> <our ip prefix>.176:50010 to delete  blk_-4237880568969698703_13404582
>>> 2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> ask
>>> <our ip prefix>.173:50010 to delete  blk_719995703651674050_13393533
>>> blk_2844104197159113873_13393516 blk_6711472063024831893_13393527
>>> blk_240592081553699046_13393054 blk_6933038421638165347_13393232
>>> blk_-1288897671244436593_13393351 blk_1408176921409258101_13393152
>>> blk_-4237880568969698703_13404582 blk_-4188858742780299895_13393410
>>> blk_-2670318277937301225_13393450 blk_-1858397614006984730_13393480
>>> blk_6292330466651227068_13393505
>>> 2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>> ask
>>> <our ip prefix>.169:50010 to delete  blk_719995703651674050_13393533
>>> blk_6711472063024831893_13393527 blk_-514463419649319075_13404582
>>> blk_-1980639884885154260_13404582 blk_240592081553699046_13393054
>>> blk_8520588884031249451_13393428 blk_-6620589068796293265_13393089
>>> blk_6933038421638165347_13393232 blk_-1288897671244436593_13393351
>>> blk_-3581082286710707012_13393556 blk_1408176921409258101_13393152
>>> blk_-1858397614006984730_13393480 blk_6292330466651227068_13393505
>>> blk_2844104197159113873_13393516 blk_-4237880568969698703_13404582
>>> blk_-4188858742780299895_13393410 blk_-2670318277937301225_13393450
>>>
>>>
>>> I'm still not clear why those deletes are happening though.  Will read
>>> through the logs again at those locations.  But if anyone's got any
>>> ideas,
>>> pls chime in.
>>>
>>> DR
>>>
>>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message