Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: avenkata@eng.ucsd.edu
In-Reply-To: <CAB37CAA.2A4E9%evans@yahoo-inc.com>
References: 
 <CAP8iz16eYqqC1n-CR1yQYVfmJhizQEZkUx1O56VDDQZKKr9-hw@mail.gmail.com>
	<CAB37CAA.2A4E9%evans@yahoo-inc.com>
Date: Thu, 6 Oct 2011 23:43:31 -0700
Message-ID: 
 <CAP8iz14P79j1sXZivsATG7ryJ6EKosmNM6UiZBAu-d39RA1hiQ@mail.gmail.com>
Subject: Re: Viewing hadoop mapper output
From: Aishwarya Venkataraman <avenkata@cs.ucsd.edu>
To: common-dev@hadoop.apache.org
Content-Type: multipart/alternative; boundary=bcaec544ecde6a023a04aeafc2ab

--bcaec544ecde6a023a04aeafc2ab
Content-Type: text/plain; charset=ISO-8859-1

Robert,

My mapper job fails. I am basically trying to run a crawler on hadoop and
hadoop kills the crawler (mapper) if it has not heard from it for a certain
timeout period. But I already have a timeout set in my mapper(500 seconds)
which is lesser than hadoop's timeout(900 seconds). The mapper just stalls
for some reason. My mapper code is as follows:

while read line;do
  result="`wget -O - --timeout=500 http://$line 2>&1`"
  echo $result
done

Any idea why my mapper is getting stalled ?

I don't see the difference between the command you have given and the one I
ran. I am not running in local mode. Is there some way by which I can get
intermediate mapper outputs ? I would like to see for which site the mapper
is getting stalled.

Thanks,
Aishwarya

On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans <evans@yahoo-inc.com> wrote:

> Alshwarya,
>
> Are you running in local mode?  If not you probably want to run
>
> hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
> ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output
>
> You may also want to run hadoop fs -ls output/* to see what files were
> produced.  If your mappers failed for some reason then there will be no
> files in the output directory. And you may want to look at the stderr logs
> for your processes through the web UI.
>
> --Bobby Evans
>
> On 10/6/11 3:30 PM, "Aishwarya Venkataraman" <avenkata@cs.ucsd.edu> wrote:
>
> I ran the following (I am using IdentityReducer) :
>
> ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
> ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output
>
> When I do
> ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
> view the output of mapper ?
>
> Thanks,
> AIshwarya
>
> On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans <evans@yahoo-inc.com> wrote:
>
> > A streaming jobs stderr is logged for the task, but its stdout is what is
> > sent to the reducer.  The simplest way to get it is to turn off the
> > reducers, and then look at the output in HDFS.
> >
> > --Bobby Evans
> >
> > On 10/6/11 1:16 PM, "Aishwarya Venkataraman" <avenkata@cs.ucsd.edu>
> wrote:
> >
> > Hello,
> >
> > I want to view the mapper output for a given hadoop streaming jobs (that
> > runs a shell script). However I am not able to find this in any log
> files.
> > Where should I look for this ?
> >
> > Thanks,
> > Aishwarya
> >
> >
>
>
> --
> Thanks,
> Aishwarya Venkataraman
> avenkata@cs.ucsd.edu
> Graduate Student | Department of Computer Science
> University of California, San Diego
>
>


-- 
Thanks,
Aishwarya Venkataraman
avenkata@cs.ucsd.edu
Graduate Student | Department of Computer Science
University of California, San Diego

--bcaec544ecde6a023a04aeafc2ab--