flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Types in the Python API
Date Fri, 31 Jul 2015 05:03:56 GMT
I think then the Python part would just serialize all the tuple fields to a
big byte array. And all the key fields to another array, so that the java
side can to comparisons on the whole "key blob".

Maybe it's overly simplistic, but it might work. :D

On Thu, 30 Jul 2015 at 23:35 Chesnay Schepler <c.schepler@web.de> wrote:

> I can see this working for basic types, but am unsure how it would work
> with Tuples. Wouldn't the java API still need to know the arity to setup
> serializers?
>
> On 30.07.2015 23:02, Aljoscha Krettek wrote:
> > I believe it should be possible to create a special PythonTypeInfo where
> > the python side is responsible for serializing data to a byte array and
> to
> > the java side it is just a byte array and all the comparisons are also
> > performed on these byte arrays. I think partitioning and sort should
> still
> > work, since the sorting is (in most cases) only used to group the
> elements
> > for a groupBy(). If proper sort order would be required this would have
> to
> > be done on the python side.
> >
> > On Thu, 30 Jul 2015 at 22:21 Chesnay Schepler <c.schepler@web.de> wrote:
> >
> >> To be perfectly honest i never really managed to work my way through
> >> Spark's python API, it's a whole bunch of magic to me; not even the
> >> general structure is understandable.
> >>
> >> With "pure python" do you mean doing everything in python? as in just
> >> having serialized data on the java side?
> >>
> >> I believe the way to do this with Flink is to add a switch that
> >> a) disables all type checks
> >> b) creates serializers dynamically at runtime.
> >>
> >> a) should be fairly straight forward, b) on the other hand....
> >>
> >> btw., the Python API itself doesn't require the type information, it
> >> already does the b part.
> >>
> >> On 30.07.2015 22:11, Gyula Fóra wrote:
> >>> That I understand, but could you please tell me how is this done
> >>> differently in Spark for instance?
> >>>
> >>> What would we need to change to make this work with pure python (as it
> >>> seems to be possible)? This probably have large performance
> implications
> >>> though.
> >>>
> >>> Gyula
> >>>
> >>> Chesnay Schepler <c.schepler@web.de> ezt írta (időpont: 2015. júl.
> 30.,
> >> Cs,
> >>> 22:04):
> >>>
> >>>> because it still goes through the Java API that requires some kind of
> >>>> type information. imagine a java api program where you omit all
> generic
> >>>> types, it just wouldn't work as of now.
> >>>>
> >>>> On 30.07.2015 21:17, Gyula Fóra wrote:
> >>>>> Hey!
> >>>>>
> >>>>> Could anyone briefly tell me what exactly is the reason why we force
> >> the
> >>>>> users in the Python API to declare types for operators?
> >>>>>
> >>>>> I don't really understand how this works in different systems but
I
> am
> >>>> just
> >>>>> curious why Flink has types and why Spark doesn't for instance.
> >>>>>
> >>>>> If you give me some pointers to read that would also be fine :)
> >>>>>
> >>>>> Thank you,
> >>>>> Gyula
> >>>>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message