crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul <>
Subject Re: Exception due to same iterator returned back by PGroupedTableType
Date Thu, 28 Jun 2012 06:05:46 GMT
Hi Gabriel,

I am doing n*(n-1) comparisons here every element would be compared with 
every other element, so peeking iterator would not help much. It would 
give me the next element but I need to keep all the elements that have 
been accessed once in another Collection so that I can iterate over them 
again and again.
or Is there some thing that would help here ?


On 27-06-2012 17:48, Gabriel Reid wrote:
> On Wed, Jun 27, 2012 at 1:41 PM, Rahul<>  wrote:
>> I am trying to create multiple iterators in a DoFn process method.
>>   public void process(Pair<Integer, Iterable<TupleN>>  input,
>>          Emitter<Pair<String, Integer>>  emitter) {}
>> Every time I ask a iterator it gives back the same one and thus I could not
>> not traverse the list again and again as I am hitting the following stack
>> trace .
> The Iterable.iterator call always returns the same iterator is because this
> is the behaviour that is inherited from the reduce method of the Hadoop
> Reducer class (and this behaviour is there because of the underlying way
> in which Hadoop MapReduce functions). In both Crunch and pure MapReduce,
> you've just got one shot at looping over an Iterable in a reducer (or DoFn
> that is functioning on a PGroupedTable).
> If I understood your code correctly, you're trying to loop over an Iterable
> while looking at two consecutive elements at a time. Probably the easiest
> way of doing this is using the PeekingIterator class in Google Guava
> (
> This will allow
> you to look one element ahead within an iterator.
> Regards,
> Gabriel

View raw message