flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: [DISCUSS] Should we supply a new Iterator instance for Functions with Iterable input(s) like CoGroupFunction ?
Date Wed, 22 Feb 2017 10:19:58 GMT
Hi Lin Li,

I think the oversight is more that we don’t throw a TraversableOnceException
if you request more than one iterator as it is the case for the Iterables
used for the non collection mode. Otherwise you will have a different
behaviour for the collection and the non collection mode.

In general, you’re right Lin Li that we don’t honour the Iterable contract
which should allow you to create an arbitrary number of iterators over the
data. Honestly, I’m not sure why we did this change because it’s not very
intuitive. Maybe Ufuk can chime in because he opened FLINK-1023.

To give you some background why we don’t allow the Iterable to return
multiple iterators over the data is that we would have to keep all the data
around in case the user creates a new iterator. Given that the data might
grow quite big, this can be a burden. With the iterator contract you know
that you can free the resources once the current element has been processed.

Cheers,
Till
​

On Wed, Feb 22, 2017 at 11:10 AM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi,
> this is probably an oversight. If it helps you implement the feature,
> please go ahead and add a sub-issue for solving the Iterator problem.
>
> Best,
> Aljoscha
>
> On Tue, 21 Feb 2017 at 16:13 Lin Li <lincoln.86xy@gmail.com> wrote:
>
> > Hi,
> >
> >     When I try to implement
> > https://issues.apache.org/jira/browse/FLINK-5498
> > via "dataset.coGroup(another dataset)" with a generated
> > CoGroupFunction.(CoGroupFunction
> > interface: public void coGroup(Iterable<IN1> first, Iterable<IN2> second,
> > Collector<O> out)
> >
> >      I couldn't get the right results, then I saw the backend Iterator
> did
> > not supply a new instance when invoked the "Iterable.iterator()" after
> > debugging.
> > (see  org.apache.flink.api.common.operators.util.ListKeyGroupedIterator,
> >  it differs from usual iterable collections in java which will implement
> > the iterator() method that supply a new iterator instance for the
> > collection. And this is not mentioned either in comments or document.)
> >
> > IMO, iterable collections' new iterator instance requirements probably
> > useful for other cases, so is it necessary to add this feature?
> > Greatful if someone can tell me the motivation that
> ListKeyGroupedIterator
> > didn't supply a new iterator instance.
> >
> > What do you think?
> >
> > Best, Lincoln
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message