hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Kendall <mkend...@justin.tv>
Subject Re: How to call method after all map jobs on slaves nodes are done
Date Fri, 13 Nov 2009 17:20:55 GMT
I don't know anything about MapRunnable but this would be pretty easy to do
with a bash script.  All you do it list out your bash commands in a text
file and run that text file...


It sounds like you're going to want to do something like...


hadoop jar mapjobs
yourProcessor tempDir outDir
hadoop dfs -copyFromLocal outDir somewhereOnDfs

How does that sound?


(PS 10 nodes for 10gb jobs is way overkill..  I have 4 nodes for TBs of

On Fri, Nov 13, 2009 at 8:43 AM, Hrishikesh Agashe <
hrishikesh_agashe@persistent.co.in> wrote:

> Hi,
> I am implementing the MapRunnable interface to create the Map jobs.
> I have large data set for processing. (Data size is around 10 GB).
> I have 1 master and 10 slaves cluster.
> When I run my program, hadoop will process data successfully.
> After processing, I am collecting all data (all are files) in hadoop
> temporary directory.
> Now my requirement is when all maps are completed on each node I want to
> call one method which will process the data from temporary directory and
> finally copy those files on HDFS.
> Is there any way to do this?
> --Hrishi
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message