flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: zipWithIndex in Python API
Date Mon, 14 Mar 2016 12:43:53 GMT
I'm a Python guru; if it doesn't have a Python API, I'll likely help 
make one :)

Work is bad this week but I'm planning to get started on this next week!


On 3/14/16 5:37 AM, Robert Metzger wrote:
> Hi Shannon,
> I'm happy to see some community engagement on our Python APIs!
> On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>> The subtaskIndex is not currently exposed to the python operator.
>> Fortunately this can be changed very easily:
>> On the java side, within PythonStreamer.startPython() the python process
>> is started and several parameters are transferred (L.129++) using
>> stdin/-out.
>> These parameters are received on the python side in Environment.execute()
>> (L.168++).
>> So the transfer is rather straight-forward, after that you only have to
>> modify the operator.configure() method to
>> also take a subtaskIndex argument, modify the RuntimeContext constructor,
>> add a getIndexOfThisSubtask() method and you're set.
>> Feel free to open a JIRA for this.
>> On 11.03.2016 18:15, Shannon Quinn wrote:
>>> Hi all,
>>> I'm interested in getting involved the Python API development. The first
>>> use-case I've encountered in my work is that of zipWithIndex, so I started
>>> looking into how to go about implementing that. It looks like the core of
>>> it involves being able to uniquely identify what worker you're currently in
>>> between distributed calls; the Scala end has
>>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>>> is more or less limited to the broadcast variables.
>>> Happy to hear any hints as to how I should get started with this. Thanks.
>>> Regards,
>>> Shannon

View raw message