hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Rosenstrauch <dar...@darose.net>
Subject Re: HDFS file disappeared
Date Thu, 18 Nov 2010 18:37:53 GMT
Tnx!

Checking the logs now ...

DR

On 11/18/2010 01:22 PM, Todd Lipcon wrote:
> The "audit log" is a set of entries in the normal NN log by default. It can
> be separated out into another log file if preferred, though it's not by
> default.
>
> Try grepping for "cmd=" in your NN log, you'll see what we're talking about.
>
> -Todd
>
> On Thu, Nov 18, 2010 at 7:24 AM, David Rosenstrauch<darose@darose.net>wrote:
>
>> Sorry, just first had a moment to get back to this now.
>>
>> ???  Never heard of the "audit log" before, and I don't see any file named
>> like that in the hadoop logs directory on the name node.  Can you elaborate
>> what the audit log is and/or where it might be found?
>>
>> Thanks,
>>
>> DR
>>
>>
>> On 11/11/2010 07:14 PM, Todd Lipcon wrote:
>>
>>> What's the last audit log entry prior to 2010-11-10 21:42:33,389?
>>>
>>> -Todd
>>>
>>> On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch<darose@darose.net
>>>> wrote:
>>>
>>>   Saw a couple more references to that block before the "to delete  blk"
>>>> messages:
>>>>
>>>> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>>> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to
>>>> invalidSet
>>>> of<our ip prefix>.169:50010
>>>> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>>> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to
>>>> invalidSet
>>>> of<our ip prefix>.173:50010
>>>> 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>>> NameSystem.addToInvalidates: blk_-4237880568969698703 is added to
>>>> invalidSet
>>>> of<our ip prefix>.176:50010
>>>>
>>>> Again, I'm not sure why this is happening though.
>>>>
>>>>
>>>> BTW, I appreciate your comments below (about it getting moved out of the
>>>> temp directory and then getting removed in another pass).  But I grepped
>>>> the
>>>> logs as you suggested, and I still don't see how it got moved/deleted:
>>>>
>>>> [root@hdmaster hadoop-0.20]# grep _attempt_201010221550_0418_r_000001_0
>>>> hadoop-hadoop-namenode-hdmaster.log.2010-11-10
>>>> 2010-11-10 21:42:28,442 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>>>> ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/<our ip root>.176
>>>> cmd=create      src=<our root
>>>>
>>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>>>>   dst=null        perm=root:supergroup:rw-r--r--
>>>>
>>>> 2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
>>>> NameSystem.allocateBlock:<our root
>>>>
>>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
>>>> blk_-4237880568969698703_13404582
>>>> 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: Removing
>>>> lease on  file<our root
>>>>
>>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>>>> from client DFSClient_attempt_201010221550_0418_r_000001_0
>>>> 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: DIR*
>>>> NameSystem.completeFile: file<our root
>>>>
>>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
>>>> is closed by DFSClient_attempt_201010221550_0418_r_000001_0
>>>>
>>>>
>>>> I have to say, I'm really quite stumped.  Been looking at this all
>>>> afternoon, and I still have no idea how/why that file got purged.  :-(
>>>>
>>>> Thanks,
>>>>
>>>> DR
>>>>
>>>>
>>>> On 11/11/2010 02:13 PM, Todd Lipcon wrote:
>>>>
>>>>   Given that it's an MR output, my guess is it got moved out of the
>>>>> temporary
>>>>> directory when the job "Committed" and then was removed as another pass.
>>>>> I'd
>>>>> grep for the containing directory name in the audit logs to see where
it
>>>>> got
>>>>> moved to and how it was eventually deleted.
>>>>>
>>>>> Would be great if someone wrote some tools that, given a block ID,
>>>>> tracked
>>>>> the life of the file that contained it (including renames of containing
>>>>> dirs, etc). Shouldn't be too difficult.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Thu, Nov 11, 2010 at 9:38 AM, David Rosenstrauch<darose@darose.net
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>>   Sorry, I stand corrected.  When I grep the name node logs for the block
>>>>>> ID
>>>>>>
>>>>>> there are some additional lines:
>>>>>>
>>>>>> [root@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
>>>>>> hadoop-hadoop-namenode-hdmaster.log.2010-11-10
>>>>>>
>>>>>> 2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> NameSystem.allocateBlock:<our root
>>>>>>
>>>>>>
>>>>>> dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
>>>>>> blk_-4237880568969698703_13404582
>>>>>>   2010-11-10 21:42:30,355 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>> BLOCK*
>>>>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.169:50010
>>>>>> is
>>>>>> added to blk_-4237880568969698703_13404582 size 326522
>>>>>> 2010-11-10 21:42:30,356 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.173:50010
>>>>>> is
>>>>>> added to blk_-4237880568969698703_13404582 size 326522
>>>>>> 2010-11-10 21:42:30,357 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.176:50010
>>>>>> is
>>>>>> added to blk_-4237880568969698703_13404582 size 326522
>>>>>> 2010-11-10 21:42:33,585 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> ask
>>>>>> <our ip prefix>.176:50010 to delete  blk_-4237880568969698703_13404582
>>>>>> 2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> ask
>>>>>> <our ip prefix>.173:50010 to delete  blk_719995703651674050_13393533
>>>>>> blk_2844104197159113873_13393516 blk_6711472063024831893_13393527
>>>>>> blk_240592081553699046_13393054 blk_6933038421638165347_13393232
>>>>>> blk_-1288897671244436593_13393351 blk_1408176921409258101_13393152
>>>>>> blk_-4237880568969698703_13404582 blk_-4188858742780299895_13393410
>>>>>> blk_-2670318277937301225_13393450 blk_-1858397614006984730_13393480
>>>>>> blk_6292330466651227068_13393505
>>>>>> 2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK*
>>>>>> ask
>>>>>> <our ip prefix>.169:50010 to delete  blk_719995703651674050_13393533
>>>>>> blk_6711472063024831893_13393527 blk_-514463419649319075_13404582
>>>>>> blk_-1980639884885154260_13404582 blk_240592081553699046_13393054
>>>>>> blk_8520588884031249451_13393428 blk_-6620589068796293265_13393089
>>>>>> blk_6933038421638165347_13393232 blk_-1288897671244436593_13393351
>>>>>> blk_-3581082286710707012_13393556 blk_1408176921409258101_13393152
>>>>>> blk_-1858397614006984730_13393480 blk_6292330466651227068_13393505
>>>>>> blk_2844104197159113873_13393516 blk_-4237880568969698703_13404582
>>>>>> blk_-4188858742780299895_13393410 blk_-2670318277937301225_13393450
>>>>>>
>>>>>>
>>>>>> I'm still not clear why those deletes are happening though.  Will
read
>>>>>> through the logs again at those locations.  But if anyone's got any
>>>>>> ideas,
>>>>>> pls chime in.
>>>>>>
>>>>>> DR

Mime
View raw message