flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shannon Quinn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3626) zipWithIndex in Python API
Date Wed, 16 Mar 2016 20:24:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198087#comment-15198087

Shannon Quinn commented on FLINK-3626:

It looks like the first step (thanks to Chesney Schepler for the guidance) is to provide a
handle to the Python environment for obtaining a subtask ID.

  1: Modify org.apache.flink.python.api.streaming.data.PythonStreamer.java's startPython()
to send the output of getIndexOfThisSubtask() through the process output stream.

  2: Modify flink-libraries / flink-python / src / main / python / org / apache / flink /
python / api / flink / plan / Environment.py's execute() method to receive the new input and
pass it through to the operator's _configure() method.

  3: Modify Function.py's _configure() method to pass the subtask ID to the runtime context

  4: Modify RuntimeContext.py to include a simple getIndexOfThisSubtask() method that returns
the ID.

Once these steps are complete, I can start on the zipWithIndex job itself.

> zipWithIndex in Python API
> --------------------------
>                 Key: FLINK-3626
>                 URL: https://issues.apache.org/jira/browse/FLINK-3626
>             Project: Flink
>          Issue Type: New Feature
>          Components: Python API
>    Affects Versions: 1.0.0
>         Environment: OS X 10.11.3, 16GB RAM, 500GB SSD, Core i7 2.5GHz.
>            Reporter: Shannon Quinn
>            Priority: Minor
> Implementation of a `zipWithIndex` method for the Python API on Flink. This will affix
each record with a sequential integer ID, consistent across the distributed data structure.
> Work here: https://github.com/magsol/flink

This message was sent by Atlassian JIRA

View raw message