hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: Total input paths number and output
Date Sat, 02 Oct 2010 16:58:18 GMT
Outputs are not dependent on number of inputs, but instead the number of
reducers (if MapReduce) or number of input splits if just plain Maps.

The number of splits is determined in most cases by the input file sizes and
the set HDFS block size factor (dfs.block.size) it was created under.

On Oct 2, 2010 10:01 PM, "Shi Yu" <shiyu@uchicago.edu> wrote:


I am running some code on a cluster with several nodes (ranging from 1 to
30) using hadoop-0.19.2. In a test,  I only put a single file under the
input folder, however, each time I find the logged "total input paths to
process" is 2 (not 1).

INFO mapred.FileInputFormat: Total input paths to process : 2

The obtained results generate two identical output files, on is named as
-00000, another is named as -00001.  There is nothing really wrong, but why
there are 2 inputs and 2 outputs? I also tried to reduce the cluster node to
1 (removing all the nodes in the conf/slaves file), also change the
dfs.replication property in the xml file to 1, but no effect. I tried
different input formats they are all the same. Where could I find the
parameter to control this? Thanks.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message