Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 57328 invoked from network); 14 Feb 2009 23:46:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Feb 2009 23:46:06 -0000 Received: (qmail 61865 invoked by uid 500); 14 Feb 2009 23:46:00 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 61812 invoked by uid 500); 14 Feb 2009 23:46:00 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 61801 invoked by uid 99); 14 Feb 2009 23:46:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2009 15:46:00 -0800 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.146.179] (HELO wa-out-1112.google.com) (209.85.146.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2009 23:45:53 +0000 Received: by wa-out-1112.google.com with SMTP id v27so951245wah.29 for ; Sat, 14 Feb 2009 15:45:31 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.149.8 with SMTP id w8mr1466641wad.39.1234655131679; Sat, 14 Feb 2009 15:45:31 -0800 (PST) In-Reply-To: <4f10e2890902141446n7b6ab23evee18199f338e505@mail.gmail.com> References: <4f10e2890902141446n7b6ab23evee18199f338e505@mail.gmail.com> Date: Sat, 14 Feb 2009 15:45:31 -0800 Message-ID: Subject: Re: Race Condition? From: Matei Zaharia To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163646c7768b3d440462e98cda X-Virus-Checked: Checked by ClamAV on apache.org --00163646c7768b3d440462e98cda Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Have you logged the output of the dfs command to see whether it's always succeeded the copy? On Sat, Feb 14, 2009 at 2:46 PM, S D wrote: > In my Hadoop 0.19.0 program each map function is assigned a directory > (representing a data location in my S3 datastore). The first thing each map > function does is copy the particular S3 data to the local machine that the > map task is running on and then being processing the data; e.g., > > command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}" > system "#{command}" > > In the above, "s3dir" is a directory that creates "localdir" - my > expectation is that "localdir" is created in the work directory for the > particular task attempt. Following this copy command I then run a function > that processes the data; e.g., > > processData(localdir) > > In some instances my map/reduce program crashes and when I examine the logs > I get a message saying that "localdir" can not be found. This confuses me > since the hadoop shell command above is blocking so that localdir should > exist by the time processData() is called. I've found that if I add in some > diagnostic lines prior to processData() such as puts statements to print > out > variables, I never run into the problem of the localdir not being found. It > is almost as if localdir needs time to be created before the call to > processData(). > > Has anyone encountered anything like this? Any suggestions on what could be > wrong are appreciated. > > Thanks, > John > --00163646c7768b3d440462e98cda--