Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com
 designates 209.85.213.51 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFY8jifBAJrATQqHrpDD6epuHLsrBttUrBoqq3WfjYafUxfdWA@mail.gmail.com>
References: 
 <CAFY8jifBAJrATQqHrpDD6epuHLsrBttUrBoqq3WfjYafUxfdWA@mail.gmail.com>
Date: Mon, 14 Apr 2014 19:42:03 +0530
Message-ID: 
 <CAORpBsjHURziLLLuJ85FffELN6ZG72G2=NHE5=7sSQmMBwXt-A@mail.gmail.com>
Subject: Re: How multiple input files are processed by mappers
From: Nitin Pawar <nitinpawar432@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf303e9fce82abf804f7014490

--20cf303e9fce82abf804f7014490
Content-Type: text/plain; charset=UTF-8

1. In real production environment do we copy these 10 files in hdfs under a
folder one by one. If this is the case then how many mappers do we specify
10 mappers. And do we use put command of hadoop to transfer this file.

Ans: This will depend on what you want to do with files. There is no rule
which says that all files need to go in one folder one.
While uploading files to hdfs via dfs clients (native hadoop cli or your
java dfs client), they do not need mappers. Its a file system operation.
Remember mappers will be involved only if you call mapreduce framework for
processing the files or writing the files. In normal file uploads, its only
dfs operations.

2. If the above is not the case then do we pre-process to merge these 10
files make it one file of size 10 TB and copy this in hdfs.

Ans: You do not need to merge the files outside and put them on hdfs as
long as individual files are of fair enough sized . When it goes to hdfs,
do you want to merge it that again depends on the purpose you want to use
it for.


On Mon, Apr 14, 2014 at 7:28 PM, Shashidhar Rao
<raoshashidhar123@gmail.com>wrote:

> Hi,
>
> Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes
> and I want to put the files in HDFS. And all the files combine together the
> size is 10 TB but each file is roughly say 1GB  only and the total number
> of files 10 files
>
> 1. In real production environment do we copy these 10 files in hdfs under
> a folder one by one. If this is the case then how many mappers do we
> specify 10 mappers. And do we use put command of hadoop to transfer this
> file.
>
> 2. If the above is not the case then do we pre-process to merge these 10
> files make it one file of size 10 TB and copy this in hdfs .
>
> Regards
> Shashidhar
>
>
>


-- 
Nitin Pawar

--20cf303e9fce82abf804f7014490
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><br></div><div>1. In real production environment do w=
e copy these 10 files in hdfs under a folder one by one. If this is the cas=
e then how many mappers do we specify 10 mappers. And do we use put command=
 of hadoop to transfer this file.</div>
<div><br></div><div>Ans: This will depend on what you want to do with files=
. There is no rule which says that all files need to go in one folder one.=
=C2=A0</div><div>While uploading files to hdfs via dfs clients (native hado=
op cli or your java dfs client), they do not need mappers. Its a file syste=
m operation. Remember mappers will be involved only if you call mapreduce f=
ramework for processing the files or writing the files. In normal file uplo=
ads, its only dfs operations. =C2=A0</div>
<div><br></div><div>2. If the above is not the case then do we pre-process =
to merge these 10 files make it one file of size 10 TB and copy this in hdf=
s.</div><div><br></div><div>Ans: You do not need to merge the files outside=
 and put them on hdfs as long as individual files are of fair enough sized =
. When it goes to hdfs, do you want to merge it that again depends on the p=
urpose you want to use it for.=C2=A0</div>
<div><br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote=
">On Mon, Apr 14, 2014 at 7:28 PM, Shashidhar Rao <span dir=3D"ltr">&lt;<a =
href=3D"mailto:raoshashidhar123@gmail.com" target=3D"_blank">raoshashidhar1=
23@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>Please can somebod=
y clarify my doubts. Say. I have a cluster of 30 nodes and I want to put th=
e files in HDFS. And all the files combine together the size is 10 TB but e=
ach file is roughly say 1GB =C2=A0only and the total number of files 10 fil=
es</div>

<div><br></div><div>1. In real production environment do we copy these 10 f=
iles in hdfs under a folder one by one. If this is the case then how many m=
appers do we specify 10 mappers. And do we use put command of hadoop to tra=
nsfer this file.</div>

<div><br></div><div>2. If the above is not the case then do we pre-process =
to merge these 10 files make it one file of size 10 TB and copy this in hdf=
s .</div><div><br></div><div>Regards</div><span class=3D""><font color=3D"#=
888888"><div>
Shashidhar</div><div><br>
</div><div><br></div></font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Nitin Pawar<=
br>
</div></div>

--20cf303e9fce82abf804f7014490--