flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timo Walther (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (FLINK-3656) Rework Table API tests
Date Thu, 16 Nov 2017 10:45:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Timo Walther resolved FLINK-3656.
       Resolution: Fixed
    Fix Version/s: 1.4.0

This issue has mostly been fixed as part of FLINK-6617. Improving the tests is a continuous
process. Therefore, we will close this issue for now.

> Rework Table API tests
> ----------------------
>                 Key: FLINK-3656
>                 URL: https://issues.apache.org/jira/browse/FLINK-3656
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Vasia Kalavri
>              Labels: starter
>             Fix For: 1.4.0
> The {{flink-table}} component consists of 
> several APIs 
> 	* Scala-embedded Table API
> 	* String-based Table API (for Java)
> 	* SQL 
> and compiles to two execution backends:
> 	* DataStream API
> 	* DataSet API
> There are many different translation paths involved until a query is executed:
> 	# Table API String -> Table API logical plan
> 	# Table API Scala-expressions -> Table API logical plan
> 	# Table API logical plan -> Calcite RelNode plans
> 	# SQL -> Calcite RelNode plans (done by exclusively via Calcite)
> 	# Calcite RelNodes -> DataSet RelNodes
> 	# DataSet RelNodes -> DataSet program
> 	# Calcite RelNodes -> DataStream RelNodes
> 	# DataStream RelNodes -> DataStream program
> 	# Calcite RexNode expressions -> generated code
> which need to be thoroughly tested.
> Initially, many tests were done as end-to-end integration tests with high overhead.
> However, due to the combinations of APIs and execution back-ends, this approach causes
many redundant tests and long build times.
> Therefore, I propose the following testing scheme:
> 1. Table API String -> Table API expression: 
> The String-based Table API is tested by comparing the resulting logical plan (Table.logicalPlan)
to the logical plan of an equivalent Table program that uses the Scala-embedded syntax. The
logical plan is the Table API internal representation which is later converted into a Calcite
RelNode plan.
> All existing integration tests that check the "Java" Table API should be ported to unit
tests. There will also be duplicated tests because, the Java Table API is tested for batch
and streaming which is not necessary anymore.
> 2. Table API Scala-expressions -> Table API logical plan -> Calcite RelNodes ->
DataSet RelNodes / DataStream RelNodes
> These tests cover the translation and optimization of Table API queries and verify the
Calcite optimized plan. We need distinct tests for DataSet and DataStream environments since
features and translation rules vary. These test will also identify if added or modified rules
or cost functions result in different plans. These should be the main tests for the Table
API and very extensive. 
> These tests should be implemented by extending the {{TableTestBase}} which is a base
class for unit tests and hence very lightweight.
> 3. SQL -> Calcite RelNodes -> DataSet RelNodes / DataStream RelNodes
> These are the same tests as described for 2. (Table API Scala-expressions -> DataSet
/ DataStream RelNodes) but just for SQL.
> 4. DataSet RelNode -> DataSet program
> Unfortunately, the DataSet API lacks a good mechanism to test generated programs, i.e.,
get a plan traversable of all operators with access to all user-defined functions. Until such
a testing utility is available, I propose to test the translation to DataSet programs as end-to-end
integration tests. However, I think we can run most tests on a Collection ExecutionEnvironment,
which does not start a Flink cluster but runs all code on Java collections. This makes these
tests much more lightweight than cluster-based ITCases. The goal of these tests should be
to cover all translation paths from DataSetRel to DataSet program, i.e., all DataSetRel nodes
and their translation logic. These tests should be implemented by extending the {{TableProgramsCollectionTestBase}}
(see FLINK-5268).
> Moreover, we should have very few cluster-based ITCases in place that check the execution
path with the actual operators, serializers, and comparators. However, we should limit these
tests to the minimum to keep build time low. These tests should be implemented by extending
the {{TableProgramsClusterTestBase}} (FLINK-5268) and all be located in the same class to
avoid repeated instantiation of the Flink MiniCluster.
> 5. DataStream RelNode -> DataStream program
> Here basically the same applies as for the DataSet programs. I'm not aware of a good
way to test generated DataStream programs without executing them. A testing utility would
be great for all libraries that are built on top of the API. Until then, I propose to use
end-to-end integration tests. Unfortunately, the DataStream API does not feature a collection
execution mode, so all tests need to be run on a MiniCluster. Therefore, we should again keep
these tests to the minimum. These tests should be implemented by extending the {{StreamingMultipleProgramsTestBase}}
and be located in few classes to avoid repeated instantiations of the FLink MiniCluster.
> 6. (Scala expressions | String-parsed expressions | SQL expressions) -> RexNode expressions
-> Generated Code
> In order to avoid extensive optimization tests for each supported expression or built-in
function, we have the {{ExpressionTestBase}} which compiles expressions into generated code
and tests for the correctness of results. All supported expressions and built-in function
should be tested by extending the {{ExpressionTestBase}} instead of running a full integration
> I will add a few JIRAs to migrate existing tests to the new testing scheme.

This message was sent by Atlassian JIRA

View raw message