hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumya Banerjee <soumya.sbaner...@gmail.com>
Subject Re: how to set the number of mappers with 0 reducers?
Date Tue, 20 Sep 2011 09:06:04 GMT

If you want all your map outputs in a single file you can use a
IdentityReducer and set the number of reducers to 1.
This would ensure that all your mapper output goes into the reducer and it
wites into a single file.


On Tue, Sep 20, 2011 at 2:04 PM, Harsh J <harsh@cloudera.com> wrote:

> Hello Wei!
> On Tue, Sep 20, 2011 at 1:25 PM, Peng, Wei <Wei.Peng@xerox.com> wrote:
> (snip)
> > However, the output from the mappers result in many small files (size is
> > ~50k, the block size is however 64M, so it wastes a lot of space).
> >
> > How can I set the number of mappers (say 100)?
> What you're looking for is to 'pack' several files per mapper, if I
> get it right.
> In that case, you need to check out the CombineFileInputFormat. It can
> pack several files per mapper (with some degree of locality).
> Alternatively, pass a list of files (as a text file) as your input,
> and have your Mapper logic read them one by one. This way, if you
> divide 50k filenames over 100 files, you will get 100 mappers as you
> want - but at the cost of losing almost all locality.
> > If there is no way to set the number of mappers, the only way to solve
> > it is "cat" some files together?
> Concatenating is an alternative, if affordable - yes. You can lower
> the file count (down from 50k) this way.
> --
> Harsh J

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message