incubator-crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Exception due to same iterator returned back by PGroupedTableType
Date Thu, 28 Jun 2012 07:00:57 GMT
Hi Rahul,

Ok, looks like I misunderstood your code. In that case, you're indeed
correct that a
PeekingIterator won't help you -- it looks like you will indeed need
to store the data
in a collection per group in order to do the processing that you're
trying to do.

Am I correct in assuming that this code is an attempt to get familiar
with Crunch,
and less about solving a real-world problem right now? If you are trying to put
together a solution for a problem, maybe you could outline what you're trying
to get to -- there may be a better way to get there. I noticed that
you're grouping
values by the hash code of the input line, which looks questionable to me.

Regards,

Gabriel

On Thu, Jun 28, 2012 at 8:05 AM, Rahul <rsharma@xebia.com> wrote:
> Hi Gabriel,
>
> I am doing n*(n-1) comparisons here every element would be compared with
> every other element, so peeking iterator would not help much. It would give
> me the next element but I need to keep all the elements that have been
> accessed once in another Collection so that I can iterate over them again
> and again.
> or Is there some thing that would help here ?
>
> regards,
> Rahul
>
>
> On 27-06-2012 17:48, Gabriel Reid wrote:
>>
>> On Wed, Jun 27, 2012 at 1:41 PM, Rahul<rsharma@xebia.com>  wrote:
>>>
>>> I am trying to create multiple iterators in a DoFn process method.
>>>
>>>  public void process(Pair<Integer, Iterable<TupleN>>  input,
>>>         Emitter<Pair<String, Integer>>  emitter) {}
>>>
>>> Every time I ask a iterator it gives back the same one and thus I could
>>> not
>>> not traverse the list again and again as I am hitting the following stack
>>> trace .
>>
>> The Iterable.iterator call always returns the same iterator is because
>> this
>> is the behaviour that is inherited from the reduce method of the Hadoop
>> Reducer class (and this behaviour is there because of the underlying way
>> in which Hadoop MapReduce functions). In both Crunch and pure MapReduce,
>> you've just got one shot at looping over an Iterable in a reducer (or DoFn
>> that is functioning on a PGroupedTable).
>>
>> If I understood your code correctly, you're trying to loop over an
>> Iterable
>> while looking at two consecutive elements at a time. Probably the easiest
>> way of doing this is using the PeekingIterator class in Google Guava
>>
>> (http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Iterators.html#peekingIterator%28java.util.Iterator%29).
>> This will allow
>> you to look one element ahead within an iterator.
>>
>> Regards,
>>
>> Gabriel
>
>

Mime
View raw message