flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Rodríguez Hortalá <juan.rodriguez.hort...@gmail.com>
Subject Re: Execution environments for testing: local vs collection vs mini cluster
Date Sun, 04 Aug 2019 22:06:57 GMT
Hi,

Thanks for your answer. I hadn't noticed that the collection environment
only works for the batch API. It's also nice to know that the mini cluster
is more an internal tool. So that the local execution environments for
batch and streaming are working very well for me, I was just curious,
thanks for the clarifications.

Greetings,

Juan


On Fri, Jul 26, 2019 at 1:32 AM Biao Liu <mmyy1110@gmail.com> wrote:

> Hi Juan,
>
> Sorry for the late reply.
>
> 1. the environments of data stream and data set are not same. An obvious
> difference is there always be a "stream" prefix of environment for data
> stream. For example, StreamExecutionEnvironment is for data stream,
> ExecutionEnvironment and CollectionEnvironment are for data set.
>
> You could use "StreamExecutionEnvironment.createLocalEnvironment" to run
> or test a data stream job. Use ExecutionEnvironment.createLocalEnvironment
> or CollectionEnvironment to run or test a data set job.
>
> Actually you could also use
> StreamExecutionEnvironment.getExecutionEnvironment
> or ExecutionEnvironment.getExecutionEnvironment. Because they would choose
> local environment automatically if you are running job standalone (in IDE
> or execute the main method directly).
>
> 2. Regarding to MiniCluster, IMO it's a bit internal. The MiniCluster runs
> as backend behind local environment. I think there is a subtle difference
> of the position between mini cluster of Flink and mini cluster of Hadoop.
>
> 3. I will try to answer your questions below.
>
> > Which test execution environment is recommended for each test use case?
> It depends on which mode you are testing, data stream or data set.
>
> > For example I don't see why would I use CollectionEnvironment when I
> have the local environment available and running on several threads, what
> is a good use case for CollectionEnvironment?
> In the official document, it says "CollectionEnvironment is a low-overhead
> approach for executing Flink programs". As I don't have much experience of
> data set, I just check the relevant codes. The CollectionEnvironment seems
> not to start a mini cluster. I believe it executes job in a lighter way.
> BTW, There is no such an equivalent environment for data stream.
>
> > Are all these 3 environments supported equality, or maybe some of them
> is expected to be deprecated?
> Obviously they are not same as mentioned above.
> If a class is deprecated, it would be decorated by an annotation
> "Deprecated".
>
> > Are there any additional execution environments that could be useful for
> testing on a single host?
> I would suggest to follow the official documents [1][2] which you have
> discovered, even there might be some other ways which seem to be
> equivalent. Because if you depend on some internal implementation, it might
> be changed over time without any notification.
>
>
> 1.
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/testing.html#integration-testing
> 2.
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/local_execution.html
>
>
> On Tue, Jul 23, 2019 at 11:30 PM Juan Rodríguez Hortalá <
> juan.rodriguez.hortala@gmail.com> wrote:
>
>> Hi Bao,
>>
>> Thanks for your answer.
>>
>> 1. Integration tests for my project.
>> 2. Both data stream and data sets
>>
>>
>>
>> On Mon, Jul 22, 2019 at 11:44 PM Biao Liu <mmyy1110@gmail.com> wrote:
>>
>>> Hi Juan,
>>>
>>> I'm not sure what you really want. Before giving some suggestions, could
>>> you answer the questions below first?
>>>
>>> 1. Do you want to write a unit test (or integration test) case for your
>>> project or for Flink? Or just want to run your job locally?
>>> 2. Which mode do you want to test? DataStream or DataSet?
>>>
>>>
>>>
>>> Juan Rodríguez Hortalá <juan.rodriguez.hortala@gmail.com> 于2019年7月23日周二
>>> 下午1:12写道:
>>>
>>>> Hi,
>>>>
>>>> In
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/local_execution.html
>>>> and
>>>> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/runtime/minicluster/MiniCluster.html
>>>> I see there are 3 ways to create an execution environment for testing:
>>>>
>>>>    - StreamExecutionEnvironment.createLocalEnvironment and
>>>>    ExecutionEnvironment.createLocalEnvironment create an execution environment
>>>>    running on a single JVM using different threads.
>>>>    - CollectionEnvironment runs on a single JVM on a single thread.
>>>>    - I haven't found not much documentation on the Mini Cluster, but
>>>>    it sounds similar to the Hadoop MiniCluster
>>>>    <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html>.
>>>>    If that is then case, then it would run on many local JVMs, each of them
>>>>    running multiple threads.
>>>>
>>>> Am I correct about the Mini Cluster? Is there any additional
>>>> documentation about it? I discovered it looking at the source code of
>>>> AbstractTestBase, that is mentioned on
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/testing.html#integration-testing.
>>>> Also, it looks like launching the mini cluster registers it somewhere, so
>>>> subsequent calls to `StreamExecutionEnvironment.getExecutionEnvironment`
>>>> return an environment that uses the mini cluster. Is that performed by
>>>> `executionEnvironment.setAsContext()` in
>>>> https://github.com/apache/flink/blob/master/flink-test-utils-parent/flink-test-utils/src/main/java/org/apache/flink/test/util/MiniClusterWithClientResource.java#L56
>>>> ? Is that execution environment registration process documented anywhere?
>>>>
>>>> Which test execution environment is recommended for each test use case?
>>>> For example I don't see why would I use CollectionEnvironment when I have
>>>> the local environment available and running on several threads, what is a
>>>> good use case for CollectionEnvironment?
>>>>
>>>> Are all these 3 environments supported equality, or maybe some of them
>>>> is expected to be deprecated?
>>>>
>>>> Are there any additional execution environments that could be useful
>>>> for testing on a single host?
>>>>
>>>> Thanks,
>>>>
>>>> Juan
>>>>
>>>>
>>>>

Mime
View raw message