Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 38031 invoked from network); 1 Dec 2006 23:13:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Dec 2006 23:13:34 -0000 Received: (qmail 89388 invoked by uid 500); 1 Dec 2006 23:13:42 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 89353 invoked by uid 500); 1 Dec 2006 23:13:42 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 89344 invoked by uid 99); 1 Dec 2006 23:13:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Dec 2006 15:13:42 -0800 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_WHOIS X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [207.126.228.150] (HELO rsmtp2.corp.yahoo.com) (207.126.228.150) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Dec 2006 15:13:29 -0800 Received: from [207.126.231.117] (broadcome-dx.corp.yahoo.com [207.126.231.117]) by rsmtp2.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id kB1NCvor026217 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 1 Dec 2006 15:12:57 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:subject: references:in-reply-to:content-type:content-transfer-encoding; b=FObC0if4pNjoz48dlsljuobud7UQjBBWIw7FgexCeKSkBCPWBjwkNXZtAr6gF8Aa Message-ID: <4570B6F9.50700@yahoo-inc.com> Date: Fri, 01 Dec 2006 15:12:57 -0800 From: Raghu Angadi User-Agent: Thunderbird 1.5.0.8 (Windows/20061025) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: minor change in dataNode handling of multiple directories. References: <456E284D.3040805@yahoo-inc.com> <1bf79d3e0611291803h2b5e3b5dq2992255a00f7f92f@mail.gmail.com> <456F1E37.3070706@yahoo-inc.com> <456F2843.4010807@yahoo-inc.com> In-Reply-To: <456F2843.4010807@yahoo-inc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Does anyone have a config where some data directories don't exists at all? The current datanode does not work in that case. It throws IOException. The current code only tolerates if the directory exist but could not be locked. Yes, we could decide not throw the exception if the directory does not exist. For now I am just going to keep the same behavior as before. Raghu. > Konstantin Shvachko wrote: >> Good point. >> I think we should document it (Javadoc?) making it a feature rather >> than a side effect. >> >> Bryan A. P. Pendleton wrote: >> >>> I would prefer this proposal not be implements. The current way >>> things work >>> makes it possible to configure, centrally, a list of all directories >>> that >>> _could_ be used for storage. Since there's no easy way to do per-node >>> configurations (nor would it be desirable, IMO, in this case), the >>> directories config ends up being the list of all possibly usable >>> directories. Many of my cluster nodes are configured using >>> "rocksclusters": >>> they will have a uniform set of mounts created, one for each physical >>> drive, >>> at boot/re-install. If I specify in my config the list of all >>> directories up >>> to the most number of drives a machine will ever have, then I get easy >>> drop-in use, regardless of variations in nodes in the cluster. I have >>> been >>> relying in the current behavior to keep me sane. >>> >>> OTOH, I wouldn't oppose making this the default behavior, with a >>> configuration param that would set things back to the old behavior. >>> >>> On 11/29/06, Raghu Angadi wrote: >>> >>>> >>>> >>>> As part of the "Version upgrade" related changes, thinking of strictly >>>> requiring that datanode be able to lock _all_ the configured >>>> directories >>>> instead of any one of them. >>>> >>>> Currently if multiple data directories are specified for a datanode, it >>>> tries to lock a file is in each of the directories. If it fails to lock >>>> some of the directories, it will use the directories that it could. >>>> Looks like this flexibility was included mainly for convenience in >>>> config file. >>>> >>>> This might not affect anyone, let us know of your opinions. >>>> >>>> Note that all directories have the same storage id. So each individual >>>> directory is not complete by itself but a part of one storage. >>>> >>>> Raghu. >>> >