hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Predicting how many values will I see in a call to reduce?
Date Tue, 09 Nov 2010 16:28:13 GMT
On Sun, Nov 7, 2010 at 5:38 AM, Anthony Urso <anthony.urso@gmail.com> wrote:

> Is there any way to know how many values I will see in a call to
> reduce without first counting through them all with the iterator?

No, there currently isn't. The framework doesn't have the information until
the iterator is exhausted. The iterator is not in memory, but is being
synthesized as the result of a N-way merge sort from disk and memory. If
your application needs that knowledge, you could do it from the application.
If your value sets are small enough to fit in memory, the easiest thing to
do is just read them into a list from the iterator (cloning the values to
avoid the object reuse!).

You could try using the resettable iterators, but I don't know how reliable
they are.

-- Owen

View raw message