Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Received-SPF: neutral (asf.osuosl.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=received:mime-version:in-reply-to:references:content-type:
	message-id:content-transfer-encoding:from:subject:date:to:x-mailer:
	return-path:x-originalarrivaltime;
	b=zKPu/ALWFfw05ye8VeHHjlWXnzQ7MVBQ4z9w34L8A8Drj2jF5V/1wf6BOY9PiRJS
Mime-Version: 1.0 (Apple Message framework v750)
In-Reply-To: <28088198.1147916347067.JavaMail.jira@brutus>
References: <28088198.1147916347067.JavaMail.jira@brutus>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <009A0B50-6042-46DE-B362-24CE96231417@yahoo-inc.com>
Content-Transfer-Encoding: 7bit
From: Eric Baldeschwieler <eric14@yahoo-inc.com>
Subject: Re: [jira] Commented: (HADOOP-124) don't permit two datanodes to run
 from same dfs.data.dir
Date: Wed, 17 May 2006 20:21:57 -0700
To: hadoop-dev@lucene.apache.org

why not store the cluster in the data node?

On May 17, 2006, at 6:39 PM, Konstantin Shvachko (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/HADOOP-124? 
> page=comments#action_12412273 ]
>
> Konstantin Shvachko commented on HADOOP-124:
> --------------------------------------------
>
> For future development in this direction.
> We should persistently store on the name node all storage IDs,  
> which the
> name node ever assigned any blocks to.
> With that knowledge the name node can reject blocks from any newly
> registered data storages that are not on the name node list.
> In other words when a data node registers NEW data storage it  
> should not
> report any blocks from that storage, and the name node can  
> effectively verify
> that since it never assigned any blocks to this storage.
> This would prevent us from accidentally connecting data nodes  
> representing
> different clusters (DFS instances).
>
>
>> don't permit two datanodes to run from same dfs.data.dir
>> --------------------------------------------------------
>>
>>          Key: HADOOP-124
>>          URL: http://issues.apache.org/jira/browse/HADOOP-124
>>      Project: Hadoop
>>         Type: Bug
>
>>   Components: dfs
>>     Versions: 0.2
>>  Environment: ~30 node cluster
>>     Reporter: Bryan Pendleton
>>     Assignee: Konstantin Shvachko
>>     Priority: Critical
>>      Fix For: 0.3
>>  Attachments: DatanodeRegister.txt, DirNotSharing.patch
>>
>> DFS files are still rotting.
>> I suspect that there's a problem with block accounting/detecting  
>> identical hosts in the namenode. I have 30 physical nodes, with  
>> various numbers of local disks, meaning that my current 'bin/ 
>> hadoop dfs -report" shows 80 nodes after a full restart. However,  
>> when I discovered the  problem (which resulted in losing about  
>> 500gb worth of temporary data because of missing blocks in some of  
>> the larger chunks) -report showed 96 nodes. I suspect somehow  
>> there were extra datanodes running against the same paths, and  
>> that the namenode was counting those as replicated instances,  
>> which then showed up over-replicated, and one of them was told to  
>> delete its local block, leading to the block actually getting lost.
>> I will debug it more the next time the situation arises. This is  
>> at least the 5th time I've had a large amount of file data "rot"  
>> in DFS since January.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>