flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Storm Compatibility
Date Sat, 14 Nov 2015 20:54:10 GMT
I would be against adding anything Storm-specific in the core (streaming is core as well) Flink
APIs. If we add stuff there we have to stick to it and I don’t see a lot of use for reusing
single Bolts/Spouts.

I’m very excited about the work on Storm compatibility in general, though. :D

> On 14 Nov 2015, at 17:19, Matthias J. Sax <mjsax@apache.org> wrote:
> About DataStream extension and setting storm dependency to provided. If
> this works, a big +1 from my side.
> -Matthias
> On 11/14/2015 05:13 PM, Matthias J. Sax wrote:
>> I just had a look at your proposal. It makes a lot of sense. I still
>> believe that it is a matter of taste if one prefers your or my point of
>> view. Both approaches allows to easily reuse and execute Storm
>> Topologies on Flink (what is the most important feature we need to have).
>> I hope to get some more feedback from the community, if the
>> Strom-compatibility should be more "stormy" or more "flinky". Bot
>> approaches make sense to me.
>> I view minor comments:
>> * FileSpout vs FiniteFileSpout
>>  -> FileSpout was implemented in a Storm way -- to set the "finished"
>> flag here does not make sense from a Storm point of view (there is no
>> such thing as a finite spout)
>>  Thus, this example shows how a regular Storm spout can be improved
>> using FiniteSpout interface -- I would keep it as is (even if seems to
>> be unnecessary complicated -- imagine that you don't have the code of
>> FileSpout)
>> * You changed examples to use finite-spouts -- from a testing point of
>> view this makes sense. However, the examples should show how to run an
>> *unmodified* Storm topology in Flink.
>> * we should keep the local copy "unprocessedBolts" when creating a Flink
>> program to allow to re-submit the same topology object twice (or alter
>> it after submission). If you don't make the copy, submitting/translating
>> the topology into a Flink job alters the object (which should not
>> happen). And as it is not performance critical, the copying overhead
>> does not matter.
>> * Why did you change the dop from 4 to 1 WordCountTopology ? We should
>> test in parallel fashion...
>> * Too many reformatting changes ;) You though many classes without any
>> actual code changes.
>> -------- Forwarded Message --------
>> Subject: Re: Storm Compatibility
>> Date: Fri, 13 Nov 2015 12:15:19 +0100
>> From: Maximilian Michels <mxm@apache.org>
>> To: Matthias J. Sax <mjsax@apache.org>
>> CC: Stephan Ewen <sewen@apache.org>, Robert Metzger <rmetzger@apache.org>
>> Hi Matthias,
>> Thank you for your remarks.
>> I believe the goal of the compatibility layer should not be to mimic
>> Storm's API but to easily execute Storm typologies using Flink. I see
>> that it is easy for users to use class names for execution they know
>> from Storm but I think this makes the API verbose. I've refactored it
>> a bit to make it more aligned with Flink's execution model. After all,
>> the most important thing is that it makes it easy for people to reuse
>> Storm typologies while getting all the advantages of Flink.
>> Let me explain what I have done so far:
>> https://github.com/apache/flink/compare/master...mxm:storm-dev
>> API
>> - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster,
>> FlinkTopology: They are not necessary in my opinion and are
>> replicating functionality already included in Flink or Storm.
>> - Build the topology with the Storm TopologyBuilder (instead of
>> FlinkTopology) which is then passed to the FlinkTopologyBuilder which
>> generates the StreamExecutionEnvironment containing the StreamGraph.
>> You can then simply call execute() like you would usually do in Flink.
>> This lets you reuse your Storm typologies with the ease of Flink
>> context-based execution mechanism. Note that it works in local and
>> remote execution mode without changing any code.
>> Tests
>> - replaced StormTestBase.java with StreamingTestBase
>> - use a Finite source for the tests and changed it a bit
>> Examples
>> - Convert examples to new API
>> - Remove duplicate examples (local and remote)
>> I hope these changes are not too invasive for you. I think it makes
>> the compatibility layer much easier to use. Let me know what you think
>> about it. Of course, we can iterate on it.
>> About the integration of the compatibility layer into DataStream:
>> Wouldn't it be possible to set storm to provided and let the user
>> include the jar if he/she wants to use the Storm compatibility? That's
>> also what we do for other libraries like Gelly. You have to package
>> them into the JAR if you want to run them on the cluster. We should
>> give a good error message if classes cannot be found.
>> +1 for moving the discussion to the dev list.
>> Cheers,
>> Max
>> On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <mjsax@apache.org> wrote:
>>> One more thing that just came to my mind about (1): I have to correct my
>>> last reply on it:
>>> We **cannot reuse** TopologyBuilder because the returned StormTopology
>>> from .createTopology() does **not** contain the references to the
>>> Spout/Bolt object. Internally, those are already serialized into an
>>> internal Thrift representation (as preparation to get sent to Nimbus).
>>> However, in order to create a Flink job, we need the references of course...
>>> -Matthias
>>> On 11/11/2015 04:33 PM, Maximilian Michels wrote:
>>>> Hi Matthias,
>>>> Sorry for getting back to you late. I'm very new to Storm but have
>>>> familiarized myself a bit the last days. While looking through the
>>>> Storm examples and the compatibility layer I discovered the following
>>>> issues:
>>>> 1) The compatibility layer mirrors the Storm API instead of reusing
>>>> it. Why do we need a FlinkTopologyBuilder, FlinkCluster,
>>>> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes by
>>>> replaced by e.g. StormExecutionEnvironment which receives the Storm
>>>> topology and upon getStreamGraph() just traverses it?
>>>> 2) DRPC is not yet supported. I don't know how crucial this is but it
>>>> seems to be widespread Storm feature. If we wrapped the entire Storm
>>>> topology, we could give appropriate errors when we see such
>>>> unsupported features.
>>>> 3) We could simplify embedding Spouts and Bolts directly as operator
>>>> functions. Users shouldn't have to worry about extracting the types.
>>>> Perhaps we could implement a dedicated method to add spouts/bolts on
>>>> DataStream?
>>>> 5) Performance: The BoltWrapper creates a StormTuple for every
>>>> incoming record. I think this could be improved. Couldn't we use the
>>>> StormTuple as data type instead of Flink's tuples?
>>>> 6) Trident Examples. Have you run any?
>>>> That's it for now. I'm sure you know about many more improvements or
>>>> problems because you're the expert on this. In the meantime, I'll try
>>>> to contact you via IRC.
>>>> Cheers,
>>>> Max
>>>> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <mjsax@apache.org>
>>>>> Hi,
>>>>> that sounds great! I am very happy that people are interested in it and
>>>>> start to use it! Can you give some more details about this? I am just
>>>>> aware of a few question at SO. But there was no question about it on
>>>>> mailing list lately... Did you get some more internal questions/feedback?
>>>>> And of course, other people should get involved as well! There is so
>>>>> much too do -- even if I work 40h a week on it, I cannot get everything
>>>>> done by myself. The last days were very busy for me. I hope I can work
>>>>> on a couple of bugs after the Munich Meetup. I started to look into them
>>>>> already...
>>>>> Should we start a roadmap in the Wiki? This might be helpful if more
>>>>> people get involved.
>>>>> And thanks for keeping me in the loop :)
>>>>> -Matthias
>>>>> On 11/06/2015 03:49 PM, Stephan Ewen wrote:
>>>>>> Hi Matthias!
>>>>>> We are seeing a lot of people getting very excited about the Storm
>>>>>> Compatibility layer. I expect that quite a few people will seriously
>>>>>> start to work with it.
>>>>>> I would suggest that we also start getting involved in that. Since
>>>>>> have of course your priority on your Ph.D., it would be a little
>>>>>> asked from you to dedicate a lot of time to support more features,
>>>>>> super responsive with users all the time, etc.
>>>>>> To that end, some people from us will start testing the API, adding
>>>>>> fixes, etc (which also helps us to understand this better when users
>>>>>> questions).
>>>>>> We would definitely like for you to stay involved (we don't want
>>>>>> hijack this), and help with ideas, especially when it comes to things
>>>>>> like fault tolerance design, etc.
>>>>>> What do you think?
>>>>>> Greetings,
>>>>>> Stephan

View raw message