hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj K Singh <rajkrrsi...@gmail.com>
Subject Re: Write and Read file through map reduce
Date Wed, 07 Jan 2015 08:09:07 GMT
you can configure your third mapreduce job using MultipleFileInput and read
those file into you job. if the file size is small then you can consider
the DistributedCache which will give you an optimal performance if you are
joining the datasets of file1 and file2. I will also recommend you to use
some job scheduling api oozie to make sure that thrid job kicks off only
when the file1 and file2 are available on the HDFS( the same can be done by
some shell script or JobControl implementation).

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370

On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi <t.hitarth@gmail.com> wrote:

> Hi,
>
> I have 6 node cluster, and the scenario is as follows :-
>
> I have one map reduce job which will write file1 in HDFS.
> I have another map reduce job which will write file2 in  HDFS.
> In the third map reduce job I need to use file1 and file2 to do some
> computation and output the value.
>
> What is the best way to store file1 and file2 in HDFS so that they could
> be used in third map reduce job.
>
> Thanks,
> Hitarth
>

Mime
View raw message