Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7926611CAA for ; Mon, 14 Apr 2014 14:12:39 +0000 (UTC) Received: (qmail 24506 invoked by uid 500); 14 Apr 2014 14:12:30 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 24288 invoked by uid 500); 14 Apr 2014 14:12:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 24281 invoked by uid 99); 14 Apr 2014 14:12:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 14:12:29 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.213.51 as permitted sender) Received: from [209.85.213.51] (HELO mail-yh0-f51.google.com) (209.85.213.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 14:12:24 +0000 Received: by mail-yh0-f51.google.com with SMTP id f10so8040974yha.38 for ; Mon, 14 Apr 2014 07:12:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=nhJdbOSHtEqu2tAgtOxd5MUgMRuHyIzjl2Bv3MkFnXY=; b=cMx9Gn25qbcBzA5MJAUQixIQAKVXlK5eElR9fqW2qVPIbSJDHl9u2LQ1AnJ2Drn9uj 1761Hfmofhdqro30yiLGvyFt/JGz0I1NLTxgeBJTKU7SlbPrBiMgOHqTjPkBT8GIW8zB wUZm6KZkRKV58qksgX64GSbX4sDJWrqB5baZzkoKN4vg6Fg8E0D0qsv2fiD3ER0OCVmO eHbHdKOWNc/afGqk7reSIGWM9snLu+Y426fHL+LCjbKMVMJbf167E2dwdfT1eeypBdkq PLItvuh5MO0gr1K57xz9LmAWw5fP69UhTPPM4fqsG/K845Ffh4MoXPhX4kAC7nLGNM9U Qo6A== MIME-Version: 1.0 X-Received: by 10.236.133.67 with SMTP id p43mr3206094yhi.114.1397484723283; Mon, 14 Apr 2014 07:12:03 -0700 (PDT) Received: by 10.170.152.84 with HTTP; Mon, 14 Apr 2014 07:12:03 -0700 (PDT) In-Reply-To: References: Date: Mon, 14 Apr 2014 19:42:03 +0530 Message-ID: Subject: Re: How multiple input files are processed by mappers From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf303e9fce82abf804f7014490 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303e9fce82abf804f7014490 Content-Type: text/plain; charset=UTF-8 1. In real production environment do we copy these 10 files in hdfs under a folder one by one. If this is the case then how many mappers do we specify 10 mappers. And do we use put command of hadoop to transfer this file. Ans: This will depend on what you want to do with files. There is no rule which says that all files need to go in one folder one. While uploading files to hdfs via dfs clients (native hadoop cli or your java dfs client), they do not need mappers. Its a file system operation. Remember mappers will be involved only if you call mapreduce framework for processing the files or writing the files. In normal file uploads, its only dfs operations. 2. If the above is not the case then do we pre-process to merge these 10 files make it one file of size 10 TB and copy this in hdfs. Ans: You do not need to merge the files outside and put them on hdfs as long as individual files are of fair enough sized . When it goes to hdfs, do you want to merge it that again depends on the purpose you want to use it for. On Mon, Apr 14, 2014 at 7:28 PM, Shashidhar Rao wrote: > Hi, > > Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes > and I want to put the files in HDFS. And all the files combine together the > size is 10 TB but each file is roughly say 1GB only and the total number > of files 10 files > > 1. In real production environment do we copy these 10 files in hdfs under > a folder one by one. If this is the case then how many mappers do we > specify 10 mappers. And do we use put command of hadoop to transfer this > file. > > 2. If the above is not the case then do we pre-process to merge these 10 > files make it one file of size 10 TB and copy this in hdfs . > > Regards > Shashidhar > > > -- Nitin Pawar --20cf303e9fce82abf804f7014490 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

1. In real production environment do w= e copy these 10 files in hdfs under a folder one by one. If this is the cas= e then how many mappers do we specify 10 mappers. And do we use put command= of hadoop to transfer this file.

Ans: This will depend on what you want to do with files= . There is no rule which says that all files need to go in one folder one.= =C2=A0
While uploading files to hdfs via dfs clients (native hado= op cli or your java dfs client), they do not need mappers. Its a file syste= m operation. Remember mappers will be involved only if you call mapreduce f= ramework for processing the files or writing the files. In normal file uplo= ads, its only dfs operations. =C2=A0

2. If the above is not the case then do we pre-process = to merge these 10 files make it one file of size 10 TB and copy this in hdf= s.

Ans: You do not need to merge the files outside= and put them on hdfs as long as individual files are of fair enough sized = . When it goes to hdfs, do you want to merge it that again depends on the p= urpose you want to use it for.=C2=A0



On Mon, Apr 14, 2014 at 7:28 PM, Shashidhar Rao <raoshashidhar1= 23@gmail.com> wrote:
Hi,

Please can somebod= y clarify my doubts. Say. I have a cluster of 30 nodes and I want to put th= e files in HDFS. And all the files combine together the size is 10 TB but e= ach file is roughly say 1GB =C2=A0only and the total number of files 10 fil= es

1. In real production environment do we copy these 10 f= iles in hdfs under a folder one by one. If this is the case then how many m= appers do we specify 10 mappers. And do we use put command of hadoop to tra= nsfer this file.

2. If the above is not the case then do we pre-process = to merge these 10 files make it one file of size 10 TB and copy this in hdf= s .

Regards
Shashidhar





--
Nitin Pawar<= br>
--20cf303e9fce82abf804f7014490--