hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashidhar Rao <raoshashidhar...@gmail.com>
Subject How multiple input files are processed by mappers
Date Mon, 14 Apr 2014 13:58:18 GMT

Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes
and I want to put the files in HDFS. And all the files combine together the
size is 10 TB but each file is roughly say 1GB  only and the total number
of files 10 files

1. In real production environment do we copy these 10 files in hdfs under a
folder one by one. If this is the case then how many mappers do we specify
10 mappers. And do we use put command of hadoop to transfer this file.

2. If the above is not the case then do we pre-process to merge these 10
files make it one file of size 10 TB and copy this in hdfs .


View raw message