hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <ma...@cloudera.com>
Subject Re: Race Condition?
Date Sat, 14 Feb 2009 23:45:31 GMT
Have you logged the output of the dfs command to see whether it's always
succeeded the copy?

On Sat, Feb 14, 2009 at 2:46 PM, S D <sd.codewarrior@gmail.com> wrote:

> In my Hadoop 0.19.0 program each map function is assigned a directory
> (representing a data location in my S3 datastore). The first thing each map
> function does is copy the particular S3 data to the local machine that the
> map task is running on and then being processing the data; e.g.,
>
> command = "hadoop dfs -copyToLocal #{s3dir} #{localdir}"
> system "#{command}"
>
> In the above, "s3dir" is a directory that creates "localdir" - my
> expectation is that "localdir" is created in the work directory for the
> particular task attempt. Following this copy command I then run a function
> that processes the data; e.g.,
>
> processData(localdir)
>
> In some instances my map/reduce program crashes and when I examine the logs
> I get a message saying that "localdir" can not be found. This confuses me
> since the hadoop shell command above is blocking so that localdir should
> exist by the time processData() is called. I've found that if I add in some
> diagnostic lines prior to processData() such as puts statements to print
> out
> variables, I never run into the problem of the localdir not being found. It
> is almost as if localdir needs time to be created before the call to
> processData().
>
> Has anyone encountered anything like this? Any suggestions on what could be
> wrong are appreciated.
>
> Thanks,
> John
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message