hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Molina-Estolano <eesto...@soe.ucsc.edu>
Subject Re: Adding new filesystem to Hadoop causing too many Map tasks
Date Mon, 04 Jun 2007 20:15:40 GMT
Thanks for the advice. I am using an old version.
I'm trying to upgrade to 0.12.3, but when I try to compile (even  
without adding in my own code) I get:

[eestolan@issdm-1 ~/hadoop-0.12.3]$ ant
Buildfile: build.xml


/cse/grads/eestolan/hadoop-0.12.3/build.xml:114: Specify at least one  
source--a file or resource collection.

Total time: 0 seconds

That line in build.xml has the following:

     <touch datetime="01/25/1971 2:00 pm">
       <fileset dir="${conf.dir}" includes="**/*.template"/>
       <fileset dir="${contrib.dir}" includes="**/*.template"/>

What might be causing the error?

     ~ Esteban

On Jun 1, 2007, at 9:26 AM, Owen O'Malley wrote:

> On Jun 1, 2007, at 1:14 AM, Esteban Molina-Estolano wrote:
>> I'm having trouble with a small test: RandomWriter, 4 TaskTracker  
>> nodes, 5 maps per node, 10 MB per map, for a total of 200 MB over  
>> 20 Map tasks. I tried it on Hadoop with DFS, and it took about 30  
>> seconds. Then, I ran the same test using Ceph. I changed  
>> fs.default.name to "ceph:///"; added fs.ceph.impl as  
>> org.apache.hadoop.fs.ceph.CephFileSystem; and left all other  
>> configuration settings untouched. It ran horrifically slowly.
>> Then the JobTracker spawned 400 Map tasks:
>> I'm ending up with way too many Map tasks, and as a result the job  
>> takes way too long to run.
> That is really strange, especially because RandomWriter isn't  
> looking at any real inputs. (Unless you are using version 0.11 or  
> earlier of Hadoop...)  Are you using an old version of Hadoop? If  
> so, I'd suspect it has something to do with the blocksize for the  
> input files being too small (likely 1 byte or so). You need to  
> return much bigger numbers for FileSystem.getBlockSize(Path) or map/ 
> reduce will default to making very small input splits.
> -- Owen

View raw message