Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6312D8E26 for ; Fri, 9 Sep 2011 10:59:11 +0000 (UTC) Received: (qmail 16409 invoked by uid 500); 9 Sep 2011 10:59:09 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 16049 invoked by uid 500); 9 Sep 2011 10:58:56 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 16020 invoked by uid 99); 9 Sep 2011 10:58:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2011 10:58:49 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of subrotosanyal@huawei.com designates 119.145.14.67 as permitted sender) Received: from [119.145.14.67] (HELO szxga04-in.huawei.com) (119.145.14.67) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2011 10:58:42 +0000 Received: from huawei.com (szxga04-in [172.24.2.12]) by szxga04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LR90098S5574C@szxga04-in.huawei.com> for mapreduce-user@hadoop.apache.org; Fri, 09 Sep 2011 18:58:20 +0800 (CST) Received: from huawei.com ([172.24.2.119]) by szxga04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTP id <0LR9008RJ557T7@szxga04-in.huawei.com> for mapreduce-user@hadoop.apache.org; Fri, 09 Sep 2011 18:58:19 +0800 (CST) Received: from blrnshtipl7nc ([10.18.1.37]) by szxml04-in.huawei.com (iPlanet Messaging Server 5.2 HotFix 2.14 (built Aug 8 2006)) with ESMTPA id <0LR900FHY556JQ@szxml04-in.huawei.com> for mapreduce-user@hadoop.apache.org; Fri, 09 Sep 2011 18:58:19 +0800 (CST) Date: Fri, 09 Sep 2011 16:28:18 +0530 From: Subroto Sanyal Subject: RE: Multiple files as input to a mapreduce job In-reply-to: <2DF48A15212FD646BE51A3FAC310F981023ABF2D@ctsinpunsxuf.cts.com> To: mapreduce-user@hadoop.apache.org Reply-to: subroto.sanyal@huawei.com Message-id: <37585B4F3CA04CA1B1D52262B9C78B19@china.huawei.com> Organization: Htipl MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.3790.4862 X-Mailer: Microsoft Office Outlook 11 Content-type: multipart/alternative; boundary="Boundary_(ID_zrW0pwCr3lBHthG2nDpQGQ)" Thread-index: AcxuDIRGbDFLabPHSUeUFBzK9TvOEAArG2cQAAlgVSA= References: <2DF48A15212FD646BE51A3FAC310F981023ABF2D@ctsinpunsxuf.cts.com> X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --Boundary_(ID_zrW0pwCr3lBHthG2nDpQGQ) Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7BIT Hi Shreya, The functionality expected by you achieved by Hive (Internally by Mapred). May be you can take look in Hive Join logics or can use Hive directly. You can look into org.apache.hadoop.mapred.lib.CombineFileInputFormat and related classes for more details. Regards, Subroto Sanyal _____ From: Shreya.Pal@cognizant.com [mailto:Shreya.Pal@cognizant.com] Sent: Friday, September 09, 2011 11:54 AM To: mapreduce-user@hadoop.apache.org Subject: Multiple files as input to a mapreduce job HI, The following is the scenario I have: I have a java program that reads multiple files from the disk. * There are 3 files (A,B,C) that are read and populated into 3 collections (arraylist). * There are 2 files input1 and input2 that act as input to my program. * I search a keyword in file input1 and find the ID corresponding to the matching entries. * This ID is used in file input2 to fetch the entries corresponding to it. There has to be a join between input1 and input2 I want to convert it into a map reduce program, is that possible? How can we read multiple input files in the mapper, can we read files from the disk. Please Advice... Regards, Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful. --Boundary_(ID_zrW0pwCr3lBHthG2nDpQGQ) Content-type: text/html; charset=us-ascii Content-transfer-encoding: quoted-printable

Hi = Shreya,

 

The functionality expected by you = achieved by Hive (Internally by Mapred).

May be you can take look in Hive = Join logics or can use Hive directly.

You can look into =
org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V&=
gt; and related classes for more details.
 

= Regards,
Subroto Sanyal


From: = Shreya.Pal@cognizant.com [mailto:Shreya.Pal@cognizant.com]
Sent: Friday, September = 09, 2011 11:54 AM
To: mapreduce-user@hadoop.apache.org
Subject: Multiple files = as input to a mapreduce job

 

HI,

 

The following is the scenario I have:

 

I have a java program that reads multiple files from the = disk.

·         There are 3 files (A,B,C) = that are read and populated into 3 collections (arraylist).

·         There are 2 files input1 = and input2 that act as input to my program.

·         I search a keyword in file = input1 and find the ID = corresponding to the matching entries.

·         This ID is used in file = input2 to fetch the entries  corresponding to it.

There has to be a join between input1 and = input2

 

I want to convert it into a map reduce program, is that = possible?

How can we read multiple input files in the mapper, can we read files from = the disk.

Please Advice…..

 

 

Regards,

Shreya

This e-mail and = any files transmitted with it are for the sole use of the intended = recipient(s) and may contain confidential and privileged = information.
If you are not the intended recipient, please contact the sender by = reply e-mail and destroy all copies of the original message.
Any unauthorised review, use, disclosure, dissemination, forwarding, = printing or copying of this email or any action taken in reliance on = this e-mail is strictly
prohibited and may be unlawful.
= --Boundary_(ID_zrW0pwCr3lBHthG2nDpQGQ)--