beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claire Yuan <clairey...@yahoo-inc.com>
Subject Re: Two example pipelines built by Yahoo intern
Date Wed, 09 Aug 2017 23:29:27 GMT
Hi,  Thank you so much for your comments! Those were really helpful in making improvements
to our work :) For question asked by Jesse, we are taking the examples from Beam and did not
notice any lambda expression there. For us, it was surprising to see java in this functional
and generic coding styles when using beam API. But after getting used to it, its convenience
did amaze us.
Claire 

    On Tuesday, August 8, 2017 4:53 PM, Eugene Kirpichov <kirpichov@google.com> wrote:
 

 +Aljoscha Krettek for comments on Flink runner
+Thomas Weise likewise for Apex runner

On Tue, Aug 8, 2017 at 4:52 PM Eugene Kirpichov <kirpichov@google.com> wrote:

Hi Claire,
Thank you - happy to see a paper with such a detailed description of your experience with
both usability of Beam per se and the execution on the Flink runner!The paper looks well-written,
and, from a quick look at the code, it seems to be using the Beam API properly without obvious
opportunities for large improvement. Great work!
A couple of suggestions:- I think it would be useful to mention explicitly in the paper abstract
/ introduction that you are testing Flink and Apex runners, and mention which other runners
are currently available, and mention why you're testing specifically Flink and Apex. This
would be useful to people reading the paper without much background in Beam, who might not
realize that Beam has many different runners with potentially very different performance or
level of support for features.- As a member of the Dataflow team, I'm curious :) Have you
considered also benchmarking these pipelines on the Dataflow runner? (especially streaming)-
For the issues you found that are clearly not "intended behavior" (e.g. unacceptably low performance
in streaming mode; pipelines not working at all with Apex runner, etc.), would it be possible
to add JIRA IDs to the paper, so that people who read the paper later can look at the JIRA
and see if it was already resolved?
Thanks.
On Tue, Aug 8, 2017 at 3:46 PM Jesse Anderson <jesse@bigdatainstitute.io> wrote:

Claire,
Interesting work.
In section 5, you talk about the Java language being difficult. Was there a reason you didn't
use Java lambdas for your work?
Thanks,
Jesse
On Tue, Aug 8, 2017 at 3:40 PM Claire Yuan <claireyuan@yahoo-inc.com> wrote:

Hi folks,  We are a two-members team interning in Yahoo! Inc who are currently evaluating
the performances and functionalities of Beam API. We built two pipelines using Beam API referencing
the default examples. One is sentiment analysis and the other one is flight performance analysis.
Here attached the codes written for the two pipelines and instructions in README about how
to run it in our framework. We would like to share them with you. Also there is a paper we
wrote about our evaluation results and our experiences about using Beam in the last two months
during internship. It will be a great help if you can have a look at it and maybe have some
comments to us. Thanks!

-- 
Thanks,
Jesse



   
Mime
View raw message