Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EAC211C01 for ; Mon, 14 Apr 2014 13:58:55 +0000 (UTC) Received: (qmail 85572 invoked by uid 500); 14 Apr 2014 13:58:46 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 85451 invoked by uid 500); 14 Apr 2014 13:58:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 85443 invoked by uid 99); 14 Apr 2014 13:58:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 13:58:45 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of raoshashidhar123@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 13:58:40 +0000 Received: by mail-wi0-f172.google.com with SMTP id hi2so4062393wib.5 for ; Mon, 14 Apr 2014 06:58:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=REX8kGnbDWLmCKIfAVBxo9S5Intqkd22BZ/kyEJb+ms=; b=h9aJSPZxDqSluTLpw4G/MF4HM0laSHhdwTm6bBbUF+LoIXqCFdqsDvkNqSGHjuTe66 KRKhlDpd6lXgS8vT/eix6rcs8SBfI6CU4jU3DSrcT4245z0jAxwD5OTs++Qq4OFayeZ9 rwLw9xSXlqdYYt062lafi0hOSrwOpt7nB7s5gpDmz/tZFkhvOA9hy2bKupD3NtHSZ9hs WU1MwdyKJHL87HfyO8gPyNDFzIiOqTAcR6chhxeQK82DZiaBLo7ypfKLNU3zoXU3NoIg XxR0FH74tSRdcudHLfiTNqvNlGcCz3ldd34eyptJ9YFr1htL5EWWir0pz/hdHU9N1eex wooQ== MIME-Version: 1.0 X-Received: by 10.180.80.3 with SMTP id n3mr9577290wix.36.1397483898425; Mon, 14 Apr 2014 06:58:18 -0700 (PDT) Received: by 10.180.78.165 with HTTP; Mon, 14 Apr 2014 06:58:18 -0700 (PDT) Date: Mon, 14 Apr 2014 19:28:18 +0530 Message-ID: Subject: How multiple input files are processed by mappers From: Shashidhar Rao To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d041825385856b304f70113bc X-Virus-Checked: Checked by ClamAV on apache.org --f46d041825385856b304f70113bc Content-Type: text/plain; charset=UTF-8 Hi, Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes and I want to put the files in HDFS. And all the files combine together the size is 10 TB but each file is roughly say 1GB only and the total number of files 10 files 1. In real production environment do we copy these 10 files in hdfs under a folder one by one. If this is the case then how many mappers do we specify 10 mappers. And do we use put command of hadoop to transfer this file. 2. If the above is not the case then do we pre-process to merge these 10 files make it one file of size 10 TB and copy this in hdfs . Regards Shashidhar --f46d041825385856b304f70113bc Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Please can somebody clarify my doub= ts. Say. I have a cluster of 30 nodes and I want to put the files in HDFS. = And all the files combine together the size is 10 TB but each file is rough= ly say 1GB =C2=A0only and the total number of files 10 files

1. In real production environment do we copy these 10 f= iles in hdfs under a folder one by one. If this is the case then how many m= appers do we specify 10 mappers. And do we use put command of hadoop to tra= nsfer this file.

2. If the above is not the case then do we pre-process = to merge these 10 files make it one file of size 10 TB and copy this in hdf= s .

Regards
Shashidhar


--f46d041825385856b304f70113bc--