hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how to set the number of mappers with 0 reducers?
Date Tue, 20 Sep 2011 08:34:53 GMT
Hello Wei!

On Tue, Sep 20, 2011 at 1:25 PM, Peng, Wei <Wei.Peng@xerox.com> wrote:
> However, the output from the mappers result in many small files (size is
> ~50k, the block size is however 64M, so it wastes a lot of space).
> How can I set the number of mappers (say 100)?

What you're looking for is to 'pack' several files per mapper, if I
get it right.

In that case, you need to check out the CombineFileInputFormat. It can
pack several files per mapper (with some degree of locality).

Alternatively, pass a list of files (as a text file) as your input,
and have your Mapper logic read them one by one. This way, if you
divide 50k filenames over 100 files, you will get 100 mappers as you
want - but at the cost of losing almost all locality.

> If there is no way to set the number of mappers, the only way to solve
> it is "cat" some files together?

Concatenating is an alternative, if affordable - yes. You can lower
the file count (down from 50k) this way.

Harsh J

View raw message