storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "刘键(Basti Liu)" <basti...@alibaba-inc.com>
Subject RE: Apache Storm/JStorm Runner(s) for Apache Beam
Date Wed, 12 Apr 2017 05:35:02 GMT
Hi Taylor,

It is glad to see your opinion. 
After the open source of Beam, there are a lot of interests in Beam from our internal users
in Alibaba and other companies in China, which promotes us to provide the support of JStorm
runner. But since the implementation of Storm runner is out of date, and over the past year
many new features or different solution(especially for exactly once and state) were introduced
in JStorm, we have to start the separate development of JStorm runner. 
Currently, we have finished a prototype(support most PTransforms, window and trigger of Beam)
as Pei mentioned in another email, and the full testing is still on-going. Some users has
built up their trial topology on it in Alibaba. But for further improvement, we still need
the help of review from Beam community to ensure the correctness, and get notification of
any broken or un-compatible update of Beam evolves. That is the reason why we decide to commit
JStorm runner into Beam repository.

For personal understanding, the JStorm runner is not a duplicated effort. The major part of
JStorm runner is probably reused in Storm. Some other parts like exactly once and state needs
a propagation. When Storm community plan to restart the development of Storm runner, we'd
like to help on this, as a part of merging JStorm features planned before. At that time, we
can discuss whether merging JStorm feature or propagation is required.
Looking forward to the better collaboration between Beam, Storm and JStorm.

Regards
Jian Liu(Basti)

-----Original Message-----
From: P. Taylor Goetz [mailto:ptgoetz@apache.org] 
Sent: Tuesday, April 11, 2017 1:48 AM
To: dev@beam.apache.org; dev@storm.apache.org
Subject: Apache Storm/JStorm Runner(s) for Apache Beam

Note: cross-posting to dev@beam and dev@storm

I’ve seen at least two threads on the dev@ list discussing the JStorm runner and my hope
is we can expand on that discussion and cross-pollinate with the Storm/JStorm/Beam communities
as well.

A while back I created a very preliminary proof of concept of getting a Storm Beam runner
working [1]. That was mainly an exercise for me to familiarize myself with the Beam API and
discover what it would take to develop a Beam runner on top of Storm. That code is way out
of date (I was targeting Beam’s HEAD before the 0.2.0 release, and a lot of changes have
since taken place) and didn’t really work as Jian Liu pointed out. It was a start, that
perhaps could be further built upon, or parts harvested, etc. I don’t have any particular
attachment to that code and wouldn’t be upset if it were completely discarded in favor of
a better or more extensible implementation.

What I would like to see, and I think this is a great opportunity to do so, is a closer collaboration
between the Apache Storm and JStorm communities. For those who aren’t familiar with those
projects’ relationship, I’ll start with a little history…

JStorm began at Alibaba as a fork of Storm (pre-Apache?) with Storm’s Clojure code reimplemented
in Java. The rationale behind that move was that Alibaba had a large number of Java developers
but very few who were proficient with Clojure. Moving to pure Java made sense as it would
expand the base of potential contributors.

In late 2015 Alibaba donated the JStorm codebase to the Apache Storm project, and the Apache
Storm PMC committed to converting its Clojure code to Java in order to incorporate the code
donation. At the time there was one catch — Apache Storm had implemented comprehensive security
features such as Kerberos authentication/authorization and multi-tenancy in its Clojure code,
which greatly complicated the move to Java and incorporation of the JStorm code. JStorm did
not have the same security features. A number of JStorm developers have also become Storm
PMC members.

Fast forward to today. The Storm community has completed the bulk of the move to Java and
the next major release (presumably 2.0, which is currently under discussion) will be largely
Java-based. We are now in a much better position to begin incorporating JStorm’s features,
as well as implementing new features necessary to support the Beam API (such as support for
bounded pipelines, among other features).

Having separate Apache Storm and JStorm beam runner implementations doesn’t feel appropriate
in my personal opinion, especially since both projects have expressed an ongoing commitment
to bringing JStorm’s additional features, and just as important, community, to Apache Storm.

One final note, when the Storm community initially discussed developing a Beam runner, the
general consensus was do so within the Storm repository. My current thinking is that such
an effort should take place within the Beam community, not only since that is the development
pattern followed by other runner implementations (Flink, Apex, etc.), but also because it
would serve to increase collaboration between Apache projects (always a good thing!).

I would love to hear opinions from others in the Storm/JStorm/Beam communities.

-Taylor=


Mime
View raw message