Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D705C1003F for ; Tue, 6 Jan 2015 13:45:18 +0000 (UTC) Received: (qmail 31249 invoked by uid 500); 6 Jan 2015 13:45:12 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 31027 invoked by uid 500); 6 Jan 2015 13:45:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31017 invoked by uid 99); 6 Jan 2015 13:45:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2015 13:45:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shahab.yunus@gmail.com designates 209.85.215.50 as permitted sender) Received: from [209.85.215.50] (HELO mail-la0-f50.google.com) (209.85.215.50) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jan 2015 13:44:38 +0000 Received: by mail-la0-f50.google.com with SMTP id pn19so19732202lab.23 for ; Tue, 06 Jan 2015 05:43:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Zli0jhnvBV0GOtndPGD7BCTCApIeTsT5DChzV2h3w8I=; b=JDdj+mp1xnXd5CJzO5plhl1/h4ump++moNAdJVNzv9oiN2zErSjhsbDz2/1/ZS1nzn yjNZrtw6VSPmT15Jr+CxzivkT/Rv84SeV9jHSqtotUBIhNuCGSIyLBAaHfAR8odgVlcA uge6vzbuDuQZ5XY0xGMhj0zdzKhAg0cUdFruSbzaX83Hy6H+6PS7onj+dSxSzwaNUqwa 5zNkQWJ9cKYj/K7eCfQR8JBTomkVO3Upg9ICSOOKz17KQjHWek+5WLwodXwdqfzlzuqv kyoVQgL264p9L0vWJo6LqEjyyYVuD6qxmboYwa7Fiz1hXorjKHdIhXyCg4fAdIii4/1z ds5w== MIME-Version: 1.0 X-Received: by 10.153.5.1 with SMTP id ci1mr92764458lad.67.1420551831889; Tue, 06 Jan 2015 05:43:51 -0800 (PST) Received: by 10.25.39.76 with HTTP; Tue, 6 Jan 2015 05:43:51 -0800 (PST) In-Reply-To: References: Date: Tue, 6 Jan 2015 08:43:51 -0500 Message-ID: Subject: Re: Write and Read file through map reduce From: Shahab Yunus To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1133a74e532c6f050bfbff84 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133a74e532c6f050bfbff84 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Distributed Cache has been deprecated for a while. You can use the new mechanism, which is functionally the same thing, discussed here in this thread: http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-depr= ecated-what-is-the-preferred-api Regards, Shahab On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni wrote: > Hi hitarth > =E2=80=8B, > > If your file1 and file 2 is smaller you can move on with Distributed Cach= e. > mentioned here > > . > > Or you can move on with MultipleInputFormat > =E2=80=8B mentioned here > =E2=80=8B > . > > [1] > http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distribute= dcache-in.html > [2] > http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multip= leinput.html > > On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu wrote: > >> Hitarth: >> You can also consider MultiFileInputFormat (and its concrete >> implementations). >> >> Cheers >> >> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet wrote: >> >>> Hitarth, >>> >>> I don't know how much direction you are looking for with regards to the >>> formats of the times but you can certainly read both files into the thi= rd >>> mapreduce job using the FileInputFormat by comma-separating the paths t= o >>> the files. The blocks for both files will essentially be unioned togeth= er >>> and the mappers scheduled across your cluster. >>> >>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi >>> wrote: >>> >>>> Hi, >>>> >>>> I have 6 node cluster, and the scenario is as follows :- >>>> >>>> I have one map reduce job which will write file1 in HDFS. >>>> I have another map reduce job which will write file2 in HDFS. >>>> In the third map reduce job I need to use file1 and file2 to do some >>>> computation and output the value. >>>> >>>> What is the best way to store file1 and file2 in HDFS so that they >>>> could be used in third map reduce job. >>>> >>>> Thanks, >>>> Hitarth >>>> >>> >>> >> > > > -- > *Thanks & Regards * > > > *Unmesha Sreeveni U.B* > *Hadoop, Bigdata Developer* > *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* > http://www.unmeshasreeveni.blogspot.in/ > > > --001a1133a74e532c6f050bfbff84 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Distributed Cache has been deprecated for a while. You can= use the new mechanism, which is functionally the same thing, discussed her= e in this thread:

Regards,
Shahab

On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <= unmeshabiju@gmai= l.com> wrote:
Hi hitarth
=E2=80=8B,

If your file1 and file 2 is smaller you can move on with Distrib= uted Cache.
mentioned here=C2=A0.

Or= you can move on with MultipleInputFormat
=E2=80=8B mentioned here=E2=80=8B .


On Tue, Jan 6, 2015 at 8:53 AM, Ted = Yu <yuzhihong@gmail.com> wrote:
Hitarth:
You can also consider MultiFileInputFormat (and its concrete imp= lementations).

Cheers

On Mon, Jan 5, 2015 at 6:14 PM= , Corey Nolet <cjnolet@gmail.com> wrote:
Hitarth,

I don't know how much dir= ection you are looking for with regards to the formats of the times but you= can certainly read both files into the third mapreduce job using the FileI= nputFormat by comma-separating the paths to the files. The blocks for both = files will essentially be unioned together and the mappers scheduled across= your cluster.

On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t.hi= tarth@gmail.com> wrote:
= Hi,

I have 6 node cluster, and the scenario is as follow= s :-

I have one map reduce job which will write fi= le1 in HDFS.
I have another map reduce job which will write file2= in =C2=A0HDFS.
In the third map reduce job I need to use file1 a= nd file2 to do some computation and output the value.

<= div>What is the best way to store file1 and file2 in HDFS so that they coul= d be used in third map reduce job.

Thanks,
Hitarth





--
Thanks & R= egards

Unmesha Sreeveni U.B
Hadoop, Bigdata Developer
=
Centre for Cyber Security | Amrita Vishwa V= idyapeetham
=


--001a1133a74e532c6f050bfbff84--