hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Optimizing Hadoop MR with File Based File Systems
Date Wed, 06 May 2009 17:55:48 GMT
Jonathan Seidman wrote:
> We've created an implementation of FileSystem which allows us to use Sector
> (http://sector.sourceforge.net/) as the backing store for Hadoop. This
> implementation is functionally complete, and we can now run Hadoop MapReduce
> jobs against data  stored in Sector.

Please consider contributing this to Hadoop.

> We're now looking at how to optimize
> this interface, since the performance suffers considerably compared to MR
> processing run against HDFS.

Have you tried setting mapred.min.split.size to a large value, so that 
files are not generally split?  Alternately, you might override 
FileInputFormat#computeSplitSize.

Doug

Mime
View raw message