crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul <rsha...@xebia.com>
Subject Re: Exception due to same iterator returned back by PGroupedTableType
Date Thu, 28 Jun 2012 06:05:46 GMT
Hi Gabriel,

I am doing n*(n-1) comparisons here every element would be compared with 
every other element, so peeking iterator would not help much. It would 
give me the next element but I need to keep all the elements that have 
been accessed once in another Collection so that I can iterate over them 
again and again.
or Is there some thing that would help here ?

regards,
Rahul

On 27-06-2012 17:48, Gabriel Reid wrote:
> On Wed, Jun 27, 2012 at 1:41 PM, Rahul<rsharma@xebia.com>  wrote:
>> I am trying to create multiple iterators in a DoFn process method.
>>
>>   public void process(Pair<Integer, Iterable<TupleN>>  input,
>>          Emitter<Pair<String, Integer>>  emitter) {}
>>
>> Every time I ask a iterator it gives back the same one and thus I could not
>> not traverse the list again and again as I am hitting the following stack
>> trace .
> The Iterable.iterator call always returns the same iterator is because this
> is the behaviour that is inherited from the reduce method of the Hadoop
> Reducer class (and this behaviour is there because of the underlying way
> in which Hadoop MapReduce functions). In both Crunch and pure MapReduce,
> you've just got one shot at looping over an Iterable in a reducer (or DoFn
> that is functioning on a PGroupedTable).
>
> If I understood your code correctly, you're trying to loop over an Iterable
> while looking at two consecutive elements at a time. Probably the easiest
> way of doing this is using the PeekingIterator class in Google Guava
> (http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Iterators.html#peekingIterator%28java.util.Iterator%29).
> This will allow
> you to look one element ahead within an iterator.
>
> Regards,
>
> Gabriel


Mime
View raw message