Lance Amundsen wrote:
> Thx, I'll give that a try. Seems to me a method to tell hadoop to split a
> file every "n" key/value pairs would be logical. Or maybe a
> createSplitBoundary when appending key/value records?
Splits should not require examining the data: that's not scalable. So
they're instead on arbitrary byte boundaries.
> I just want a way, and not a real complex way, of directing the # of maps
> and the breakdown of records going to them. Creating a separate file per
> record group is too slow for my purposes.
Just set the number of map tasks. That should mostly do what you want
in this case. If you want finer-grained control, implement your own
InputFormat.
Doug
|