beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pablo Estrada (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-1442) Performance improvement of the Python DirectRunner
Date Wed, 22 Feb 2017 21:44:44 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879034#comment-15879034
] 

Pablo Estrada edited comment on BEAM-1442 at 2/22/17 9:43 PM:
--------------------------------------------------------------

Hi Haoxiang,
It's great that you find the project interesting. It is a challenging (and exciting) project.
We want to have a detailed proposal, because as you may guess, the project is not easy and
we want to help you (or any student) understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as specific and detailed
as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can help you polish
the proposal to make it robust - and help you familiarize yourself with the DirectRunner.


was (Author: pabloem):
Hi Haoxiang,
It's great that you find the project interesting. It is a challenging -and exciting- project.
We want to have a detailed proposal, because as you may guess, the project is not easy and
we want to help you (or any student) understand the DirectRunner well before you are selected.

With this in mind, we suggest you include the following items in the proposal:
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as specific and detailed
as possible.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

Feel free to ask questions, or share your train of thought here, and we can help you polish
the proposal to make it robust - and help you familiarize yourself with the DirectRunner.

> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>
>                 Key: BEAM-1442
>                 URL: https://issues.apache.org/jira/browse/BEAM-1442
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Ahmet Altay
>              Labels: gsoc2017, mentor, python
>
> The DirectRunner for Python and Java are intended to act as policy enforcers, and correctness
checkers for Beam pipelines; but there are users that run data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although some work
has gone into improving it. There are more opportunities for improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, and study
the `Pipeline.run` and `DirectRunner.run` methods. Ask questions directly on JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message