Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AECA39F39 for ; Fri, 7 Oct 2011 06:43:58 +0000 (UTC) Received: (qmail 30995 invoked by uid 500); 7 Oct 2011 06:43:57 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 30796 invoked by uid 500); 7 Oct 2011 06:43:56 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 30788 invoked by uid 99); 7 Oct 2011 06:43:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Oct 2011 06:43:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.50] (HELO mail-pz0-f50.google.com) (209.85.210.50) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Oct 2011 06:43:51 +0000 Received: by pzk37 with SMTP id 37so9059905pzk.9 for ; Thu, 06 Oct 2011 23:43:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.59.10 with SMTP id v10mr11971939pbq.16.1317969811079; Thu, 06 Oct 2011 23:43:31 -0700 (PDT) Sender: avenkata@eng.ucsd.edu Received: by 10.68.47.195 with HTTP; Thu, 6 Oct 2011 23:43:31 -0700 (PDT) In-Reply-To: References: Date: Thu, 6 Oct 2011 23:43:31 -0700 X-Google-Sender-Auth: gTWDWrnYLdHcbM4br6CthWRJJuU Message-ID: Subject: Re: Viewing hadoop mapper output From: Aishwarya Venkataraman To: common-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec544ecde6a023a04aeafc2ab --bcaec544ecde6a023a04aeafc2ab Content-Type: text/plain; charset=ISO-8859-1 Robert, My mapper job fails. I am basically trying to run a crawler on hadoop and hadoop kills the crawler (mapper) if it has not heard from it for a certain timeout period. But I already have a timeout set in my mapper(500 seconds) which is lesser than hadoop's timeout(900 seconds). The mapper just stalls for some reason. My mapper code is as follows: while read line;do result="`wget -O - --timeout=500 http://$line 2>&1`" echo $result done Any idea why my mapper is getting stalled ? I don't see the difference between the command you have given and the one I ran. I am not running in local mode. Is there some way by which I can get intermediate mapper outputs ? I would like to see for which site the mapper is getting stalled. Thanks, Aishwarya On Thu, Oct 6, 2011 at 1:41 PM, Robert Evans wrote: > Alshwarya, > > Are you running in local mode? If not you probably want to run > > hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file > ~/mapper.sh -mapper ./mapper.sh -input ../foo.txt -output output > > You may also want to run hadoop fs -ls output/* to see what files were > produced. If your mappers failed for some reason then there will be no > files in the output directory. And you may want to look at the stderr logs > for your processes through the web UI. > > --Bobby Evans > > On 10/6/11 3:30 PM, "Aishwarya Venkataraman" wrote: > > I ran the following (I am using IdentityReducer) : > > ./hadoop jar ../contrib/streaming/hadoop-0.20.2-streaming.jar -file > ~/mapper.sh -mapper ~/mapper.sh -input ../foo.txt -output output > > When I do > ./hadoop dfs -cat output/* I do not see any output on screen. Is this how I > view the output of mapper ? > > Thanks, > AIshwarya > > On Thu, Oct 6, 2011 at 12:37 PM, Robert Evans wrote: > > > A streaming jobs stderr is logged for the task, but its stdout is what is > > sent to the reducer. The simplest way to get it is to turn off the > > reducers, and then look at the output in HDFS. > > > > --Bobby Evans > > > > On 10/6/11 1:16 PM, "Aishwarya Venkataraman" > wrote: > > > > Hello, > > > > I want to view the mapper output for a given hadoop streaming jobs (that > > runs a shell script). However I am not able to find this in any log > files. > > Where should I look for this ? > > > > Thanks, > > Aishwarya > > > > > > > -- > Thanks, > Aishwarya Venkataraman > avenkata@cs.ucsd.edu > Graduate Student | Department of Computer Science > University of California, San Diego > > -- Thanks, Aishwarya Venkataraman avenkata@cs.ucsd.edu Graduate Student | Department of Computer Science University of California, San Diego --bcaec544ecde6a023a04aeafc2ab--