hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Write and Read file through map reduce
Date Tue, 06 Jan 2015 13:43:51 GMT
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Regards,
Shahab

On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <unmeshabiju@gmail.com>
wrote:

> Hi hitarth
> ​,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cache.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html>
>  .
>
> Or you can move on with MultipleInputFormat
> ​ mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html>​
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multipleinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cjnolet@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the third
>>> mapreduce job using the FileInputFormat by comma-separating the paths to
>>> the files. The blocks for both files will essentially be unioned together
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t.hitarth@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in  HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Mime
View raw message