flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chesnay Schepler <ches...@apache.org>
Subject Re: zipWithIndex in Python API
Date Fri, 11 Mar 2016 17:32:43 GMT
The subtaskIndex is not currently exposed to the python operator.

Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process 
is started and several parameters are transferred (L.129++) using 
stdin/-out.
These parameters are received on the python side in 
Environment.execute() (L.168++).

So the transfer is rather straight-forward, after that you only have to 
modify the operator.configure() method to
also take a subtaskIndex argument, modify the RuntimeContext 
constructor, add a getIndexOfThisSubtask() method and you're set.

Feel free to open a JIRA for this.

On 11.03.2016 18:15, Shannon Quinn wrote:
> Hi all,
>
> I'm interested in getting involved the Python API development. The 
> first use-case I've encountered in my work is that of zipWithIndex, so 
> I started looking into how to go about implementing that. It looks 
> like the core of it involves being able to uniquely identify what 
> worker you're currently in between distributed calls; the Scala end 
> has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime 
> context is more or less limited to the broadcast variables.
>
> Happy to hear any hints as to how I should get started with this. Thanks.
>
> Regards,
> Shannon
>


Mime
View raw message