hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George P. Stathis" <gstat...@traackr.com>
Subject Limiting the number of data records processed per reduce process
Date Mon, 27 Sep 2010 18:27:47 GMT
Possible beginner's question here but I can't find an obvious answer in the
docs. Is there a way to configure a job such that it imposes a cap on the
number of records each reduce process receives at a time, regardless of how
the data was partitioned or how many reducers were configured for the job?
The limitation here is that one does not know the number of records that
will be processed ahead of time so as to manually configure the number of
reducers. The obvious workaround is to do a first pass to count the records
and then a second one that sets how many reducers should be used so as the
match the target max records number. But I'm hoping for a more elegant
alternative.

Thank you in advance for your time.

-GS

Mime
View raw message