hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <...@yahoo-inc.com>
Subject Re: Adding new filesystem to Hadoop causing too many Map tasks
Date Fri, 01 Jun 2007 16:26:06 GMT

On Jun 1, 2007, at 1:14 AM, Esteban Molina-Estolano wrote:

> I'm having trouble with a small test: RandomWriter, 4 TaskTracker  
> nodes, 5 maps per node, 10 MB per map, for a total of 200 MB over  
> 20 Map tasks. I tried it on Hadoop with DFS, and it took about 30  
> seconds. Then, I ran the same test using Ceph. I changed  
> fs.default.name to "ceph:///"; added fs.ceph.impl as  
> org.apache.hadoop.fs.ceph.CephFileSystem; and left all other  
> configuration settings untouched. It ran horrifically slowly.
> Then the JobTracker spawned 400 Map tasks:
> I'm ending up with way too many Map tasks, and as a result the job  
> takes way too long to run.

That is really strange, especially because RandomWriter isn't looking  
at any real inputs. (Unless you are using version 0.11 or earlier of  
Hadoop...)  Are you using an old version of Hadoop? If so, I'd  
suspect it has something to do with the blocksize for the input files  
being too small (likely 1 byte or so). You need to return much bigger  
numbers for FileSystem.getBlockSize(Path) or map/reduce will default  
to making very small input splits.

-- Owen

View raw message