hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Young Kim <juneng...@gmail.com>
Subject How I can assume the proper a block size if the input file size is dynamic?
Date Tue, 22 Feb 2011 08:57:21 GMT
hi, all.

I know dfs.blocksize key can affect the performance of a hadoop.

in my case, I have thousands of directories which are including so many 
different sized input files.
(file sizes are from 10K to 1G).

in this case, How I can assume the dfs.blocksize to get a best performance?

11/02/22 17:45:49 INFO input.FileInputFormat: Total input paths to 
process : *15407*
11/02/22 17:45:54 WARN conf.Configuration: mapred.map.tasks is 
deprecated. Instead, use mapreduce.job.maps
11/02/22 17:45:54 INFO mapreduce.JobSubmitter: number of splits:*15411*
11/02/22 17:45:54 INFO mapreduce.JobSubmitter: adding the following 
namenodes' delegation tokens:null
11/02/22 17:45:54 INFO mapreduce.Job: Running job: job_201102221737_0002
11/02/22 17:45:55 INFO mapreduce.Job:  map 0% reduce 0%


Junyoung Kim (juneng603@gmail.com)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message