hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: InputFiles, Splits, Maps, Tasks Questions 1.3 Base
Date Thu, 18 Oct 2007 23:04:37 GMT
Lance Amundsen wrote:
> Thx, I'll give that a try.   Seems to me a method to tell hadoop to split a
> file every "n" key/value pairs would be logical.  Or maybe a
> createSplitBoundary when appending key/value records?

Splits should not require examining the data: that's not scalable.  So 
they're instead on arbitrary byte boundaries.

> I just want a way, and not a real complex way, of directing the # of maps
> and the breakdown of records going to them.  Creating a separate file per
> record group is too slow for my purposes.

Just set the number of map tasks.  That should mostly do what you want 
in this case.  If you want finer-grained control, implement your own 
InputFormat.

Doug

Mime
View raw message