hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Kumar <alok...@gmail.com>
Subject Re: How multiple input files are processed by mappers
Date Mon, 14 Apr 2014 19:25:17 GMT

You can just use put command to load file into HDFS

Copying files into hdfs won't require mapper or map-reduce job;
It depends on your processing logic ( map-reduce code ) if you really
require to have single merged file.
Also, you can set map.input.dir directory path in job configuration.


On Mon, Apr 14, 2014 at 9:58 AM, Shashidhar Rao

> Hi,
> Please can somebody clarify my doubts. Say. I have a cluster of 30 nodes
> and I want to put the files in HDFS. And all the files combine together the
> size is 10 TB but each file is roughly say 1GB  only and the total number
> of files 10 files
> 1. In real production environment do we copy these 10 files in hdfs under
> a folder one by one. If this is the case then how many mappers do we
> specify 10 mappers. And do we use put command of hadoop to transfer this
> file.
> 2. If the above is not the case then do we pre-process to merge these 10
> files make it one file of size 10 TB and copy this in hdfs .
> Regards
> Shashidhar

View raw message