flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: zipWithIndex in Python API
Date Mon, 14 Mar 2016 09:37:53 GMT
Hi Shannon,

I'm happy to see some community engagement on our Python APIs!

On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <chesnay@apache.org>
wrote:

> The subtaskIndex is not currently exposed to the python operator.
>
> Fortunately this can be changed very easily:
> On the java side, within PythonStreamer.startPython() the python process
> is started and several parameters are transferred (L.129++) using
> stdin/-out.
> These parameters are received on the python side in Environment.execute()
> (L.168++).
>
> So the transfer is rather straight-forward, after that you only have to
> modify the operator.configure() method to
> also take a subtaskIndex argument, modify the RuntimeContext constructor,
> add a getIndexOfThisSubtask() method and you're set.
>
> Feel free to open a JIRA for this.
>
>
> On 11.03.2016 18:15, Shannon Quinn wrote:
>
>> Hi all,
>>
>> I'm interested in getting involved the Python API development. The first
>> use-case I've encountered in my work is that of zipWithIndex, so I started
>> looking into how to go about implementing that. It looks like the core of
>> it involves being able to uniquely identify what worker you're currently in
>> between distributed calls; the Scala end has
>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>> is more or less limited to the broadcast variables.
>>
>> Happy to hear any hints as to how I should get started with this. Thanks.
>>
>> Regards,
>> Shannon
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message