hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: debugging hadoop streaming programs (first code)
Date Tue, 20 Nov 2012 18:39:03 GMT

The mapreduce webUI gives you all the information you need for debugging you code. Depending
on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link
as well the task, taskattempt and logs pages.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Nov 20, 2012, at 5:33 AM, jamal sasha wrote:

> Hi,
>    If I just use pipes, then the code runs just fine.. the issue is when I deploy it
on clusters...
> :(
> Any suggestions on how to debug it.
> 
> 
> On Tue, Nov 20, 2012 at 7:42 AM, Mahesh Balija <balijamahesh.mca@gmail.com> wrote:
> Hi Jamal,
> 
>           You can debug your MapReduce program if it is written in java code, by running
your MR job in LocalRunner mode via eclipse.
>           Or even you can have some debug statements (or even S.O.Ps) written in your
code so that you can check where your job fails.
> 
>           But I am NOT sure for python, but one suggestion is can you run your Python
code (Map unit & reduce unit) locally on your input data and see whether your logic has
any issues.
> 
> Best,
> Mahesh Balija,
> Calsoft Labs.
> 
> 
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha <jamalshasha@gmail.com> wrote:
> 
> 
> 
> Hi,
>   This is my first attempt to learn the map reduce abstraction.
> 
> My problem is as follows
> I have a text file as follows:
> id 1, id2, date,time,mrps,code,code2
> 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 
> 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 
> 
> Now what I want is to do is to count the number of transaction happening in every half
an hour between 7 am and 11 am.
> So here are the  intervals.
> 
> 7-7:30 ->0
> 7:30-8 -> 1
> 8-8:30->2
> ....
> 10:30-11->7
> So ultimately what I am doing is creating a 2d dictionary 
> d[id2][interval] = count_transactions.
> 
> My mappers and reducers are attached (sample input also).
> The code run just fine if i run via
> cat input.txt | python mapper.py | sort | python reducer.py
> 
> Gives me the output but when i run it on clusters.. it throws an error which is not helpful
(basically on the terminal it says job unsuccesful reason NA).
> Any suggestion on what am i doing wrong.
> 
> Jamal 
> 
> 
> 
> 
> 
> 
> 


Mime
View raw message