hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-124) Files still rotting in DFS of latest Hadoop
Date Sat, 08 Apr 2006 04:49:04 GMT
both seem like good ideas

On Apr 7, 2006, at 11:21 AM, Owen O'Malley (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-124? 
> page=comments#action_12373675 ]
>
> Owen O'Malley commented on HADOOP-124:
> --------------------------------------
>
> It seems like it would help to have the datanode generate a unique  
> identifier the first time it is run, and save that in the data  
> directory. On datanode restarts it uses the unique identifier from  
> the data directory. The namenode would be able to complain about  
> multiple instance of the same datanode.
>
>> Files still rotting in DFS of latest Hadoop
>> -------------------------------------------
>>
>>          Key: HADOOP-124
>>          URL: http://issues.apache.org/jira/browse/HADOOP-124
>>      Project: Hadoop
>>         Type: Bug
>
>>   Components: dfs
>>  Environment: ~30 node cluster
>>     Reporter: Bryan Pendleton
>
>>
>> DFS files are still rotting.
>> I suspect that there's a problem with block accounting/detecting  
>> identical hosts in the namenode. I have 30 physical nodes, with  
>> various numbers of local disks, meaning that my current 'bin/ 
>> hadoop dfs -report" shows 80 nodes after a full restart. However,  
>> when I discovered the  problem (which resulted in losing about  
>> 500gb worth of temporary data because of missing blocks in some of  
>> the larger chunks) -report showed 96 nodes. I suspect somehow  
>> there were extra datanodes running against the same paths, and  
>> that the namenode was counting those as replicated instances,  
>> which then showed up over-replicated, and one of them was told to  
>> delete its local block, leading to the block actually getting lost.
>> I will debug it more the next time the situation arises. This is  
>> at least the 5th time I've had a large amount of file data "rot"  
>> in DFS since January.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>


Mime
View raw message