Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B882410E8D for ; Wed, 7 Jan 2015 08:12:24 +0000 (UTC) Received: (qmail 20081 invoked by uid 500); 7 Jan 2015 08:12:18 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 19967 invoked by uid 500); 7 Jan 2015 08:12:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 19957 invoked by uid 99); 7 Jan 2015 08:12:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2015 08:12:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rajkrrsingh@gmail.com designates 209.85.216.175 as permitted sender) Received: from [209.85.216.175] (HELO mail-qc0-f175.google.com) (209.85.216.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2015 08:12:13 +0000 Received: by mail-qc0-f175.google.com with SMTP id p6so568313qcv.20 for ; Wed, 07 Jan 2015 00:09:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=47AA4uml/Ws4W4Lqp+mec8ElFckTbSy7sszrxg5l25E=; b=L+KHPwV1PmIqdLYCTOxSNchg5Cv3Mu8SDrmExzo1tkYkoV8yCfB5DLC6mLi5bJaG1s 0tNlQEMwfsDgwDnb+KBnDAcANkt0p2JKsq1BhO10NqDwljjpIfFjgKGyxB5n3r+8F6su d5niBaTiwbDTH60NtOuvDVlmWWcefEWELvz+u1pJtHMO7eLN4u8uMBa2FKC089h1mYiW 5QDYwL82puzmU5aTBO/9Gu6lvOIT4h6CjR8Q/T7EdBM5nEbEXwkGX2zVOewj7ZpbhFme VhwTo8O0IWY5X5IN8x9Zx63JCgfLReq/i6OBQyE/wfIhzu15U+DihJzJVZtN6u2ca8hJ eeHQ== X-Received: by 10.229.248.69 with SMTP id mf5mr2479358qcb.29.1420618178000; Wed, 07 Jan 2015 00:09:38 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.93.117 with HTTP; Wed, 7 Jan 2015 00:09:07 -0800 (PST) In-Reply-To: References: From: Raj K Singh Date: Wed, 7 Jan 2015 13:39:07 +0530 Message-ID: Subject: Re: Write and Read file through map reduce To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a113340dadc7e33050c0b71ce X-Virus-Checked: Checked by ClamAV on apache.org --001a113340dadc7e33050c0b71ce Content-Type: text/plain; charset=UTF-8 you can configure your third mapreduce job using MultipleFileInput and read those file into you job. if the file size is small then you can consider the DistributedCache which will give you an optimal performance if you are joining the datasets of file1 and file2. I will also recommend you to use some job scheduling api oozie to make sure that thrid job kicks off only when the file1 and file2 are available on the HDFS( the same can be done by some shell script or JobControl implementation). :::::::::::::::::::::::::::::::::::::::: Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Tue, Jan 6, 2015 at 2:25 AM, hitarth trivedi wrote: > Hi, > > I have 6 node cluster, and the scenario is as follows :- > > I have one map reduce job which will write file1 in HDFS. > I have another map reduce job which will write file2 in HDFS. > In the third map reduce job I need to use file1 and file2 to do some > computation and output the value. > > What is the best way to store file1 and file2 in HDFS so that they could > be used in third map reduce job. > > Thanks, > Hitarth > --001a113340dadc7e33050c0b71ce Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
you can configure your third mapreduce job using MultipleF= ileInput and read those file into you job. if the file size is small then y= ou can consider the DistributedCache which will give you an optimal perform= ance if you are joining the datasets of file1 and file2. I will also recomm= end you to use some job scheduling api oozie to make sure that thrid job ki= cks off only when the file1 and file2 are available on the HDFS( the same c= an be done by some shell script or JobControl implementation).
<= div class=3D"gmail_extra">
:::::::::::::= :::::::::::::::::::::::::::
Raj K Singh

On Tue, Jan 6, 2015 at 2:25 AM, hitarth triv= edi <t.hitarth@gmail.com> wrote:
Hi,

I have 6 node cluster, and= the scenario is as follows :-

I have one map redu= ce job which will write file1 in HDFS.
I have another map reduce = job which will write file2 in =C2=A0HDFS.
In the third map reduce= job I need to use file1 and file2 to do some computation and output the va= lue.

What is the best way to store file1 and file2= in HDFS so that they could be used in third map reduce job.

=
Thanks,
Hitarth

--001a113340dadc7e33050c0b71ce--