hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: How to tell my Hadoop cluster to read data from an external server
Date Tue, 26 Mar 2013 09:42:16 GMT
you are looking at a two step workflow here

first unit of your workflow will download the file from external server and
write it to DFS and return the file path
second unit of your workflow will read the input path and process the data
according to your business logic in MR

you can look at cascading for this simple approach. Its easy to build
simple workflow application using cascading.
other options being oozie or you may try crunch (its very new but easy to
use as well)

On Tue, Mar 26, 2013 at 2:49 PM, Agarwal, Nikhil

>  Hi,****
> ** **
> I have a Hadoop cluster up and running. I want to submit an MR job to it
> but the input data is kept on an external server (outside the hadoop
> cluster). Can anyone please suggest how do I tell my hadoop cluster to load
> the input data from the external servers and then do a MR on it ?****
> ** **
> Thanks & Regards,****
> Nikhil****

Nitin Pawar

View raw message