hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aishwarya Venkataraman <avenk...@cs.ucsd.edu>
Subject Re: Viewing hadoop mapper output
Date Fri, 07 Oct 2011 06:43:31 GMT
Robert,

My mapper job fails. I am basically trying to run a crawler on hadoop and
hadoop kills the crawler (mapper) if it has not heard from it for a certain
timeout period. But I already have a timeout set in my mapper(500 seconds)
which is lesser than hadoop's timeout(900 seconds). The mapper just stalls
for some reason. My mapper code is as follows:

while read line;do
  result="`wget -O - --timeout=500 http://$line 2>&1`"
  echo $result
done

Any idea why my mapper is getting stalled ?

I don't see the difference between the command you have given and the one I
ran. I am not running in local mode. Is there some way by which I can get
intermediate mapper outputs ? I would like to see for which site the mapper
is getting stalled.

Thanks,
Aishwarya

On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans <evans@yahoo-inc.com> wrote:

> Alshwarya,
>
> Are you running in local mode?  If not you probably want to run
>
> hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
> ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output
>
> You may also want to run hadoop fs -ls output/* to see what files were
> produced.  If your mappers failed for some reason then there will be no
> files in the output directory. And you may want to look at the stderr logs
> for your processes through the web UI.
>
> --Bobby Evans
>
> On 10/6/11 3:30 PM, "Aishwarya Venkataraman" <avenkata@cs.ucsd.edu> wrote:
>
> I ran the following (I am using IdentityReducer) :
>
> ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file
> ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output
>
> When I do
> ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I
> view the output of mapper ?
>
> Thanks,
> AIshwarya
>
> On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans <evans@yahoo-inc.com> wrote:
>
> > A streaming jobs stderr is logged for the task, but its stdout is what is
> > sent to the reducer.  The simplest way to get it is to turn off the
> > reducers, and then look at the output in HDFS.
> >
> > --Bobby Evans
> >
> > On 10/6/11 1:16 PM, "Aishwarya Venkataraman" <avenkata@cs.ucsd.edu>
> wrote:
> >
> > Hello,
> >
> > I want to view the mapper output for a given hadoop streaming jobs (that
> > runs a shell script). However I am not able to find this in any log
> files.
> > Where should I look for this ?
> >
> > Thanks,
> > Aishwarya
> >
> >
>
>
> --
> Thanks,
> Aishwarya Venkataraman
> avenkata@cs.ucsd.edu
> Graduate Student | Department of Computer Science
> University of California, San Diego
>
>


-- 
Thanks,
Aishwarya Venkataraman
avenkata@cs.ucsd.edu
Graduate Student | Department of Computer Science
University of California, San Diego

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message