beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-160) Port 'NexMark Queries' to Beam for use as integration test
Date Fri, 12 May 2017 08:46:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007806#comment-16007806
] 

ASF GitHub Bot commented on BEAM-160:
-------------------------------------

GitHub user echauchot opened a pull request:

    https://github.com/apache/beam/pull/3114

    [BEAM-160] Port 'NexMark Queries' to Beam for use as integration test

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [X] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [X] Make sure tests pass via `mvn clean verify`.
     - [X] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [X] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---
    R: @dpmills @dhalperi
    CC: @stasl @aviemzur @aljoscha  because we discussed NexMark together :)
    CC: @mshields822 I know you don not work on it anymore, but you might be interested :)
    CC: @ssisk for reflexion for IT tests
    
    This is a port of the NexMark queries to Beam, to be used as integration tests.
    This can also be used as A-B testing (no-regression or performance comparison between
2 versions of the same engine or of the same runner)
    
    This a continuation of the previous PR (https://github.com/apache/beam/pull/99) from Mark
Shields.
    The code has changed quite a bit: some queries have changed to use new Beam APIs and there
where some big refactors. More important, we can now run all the queries in all the runners.
Nevertheless, there are still some open issues in Nexmark (https://github.com/iemejia/beam/issues)
and in Beam upstream (see issue links in https://issues.apache.org/jira/browse/BEAM-160)
    
    Here is a doc that present NexMark components and pseudo code of the queries to ease the
review : https://drive.google.com/open?id=1VgnGiVu8vSfm7Et-xAtQYv0PlEpqeyfmhpQUNPmWRJs 
    
    Everything needed to launch the queries is in the Readme. There is also a support matrix
towards the runners.
    
    Please do not squash commits because there are several authors Mark, Ismaël and I.
    
    Good review :) !

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iemejia/beam BEAM-160-nexmark

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3114
    
----
commit a7364a56acddf93a041535ac3fe4561d4d655c6d
Author: Mark Shields <markshields@google.com>
Date:   2016-03-28T23:25:29Z

    NexMark

commit 6c013cb98b0011094889f8ae8e7e2646317cb813
Author: Mark Shields <markshields@google.com>
Date:   2016-06-03T00:32:49Z

    Port unit tests, cleanup pom and add license to readme

commit 316b7e6684cfb78340484b736e473bb967d54361
Author: Ismaël Mejía <iemejia@gmail.com>
Date:   2016-11-30T17:43:02Z

    Update Nexmark to the current Beam snapshot 0.7.0
    
    Refactor from InProcessRunner to DirectRunner
    Add Spark driver
    Add Apex runner
    Refine error logging per class and add log4j properties
    Move README to top level and add section on configuration
    Move project to the specific nexmark directory
    Fix existing issues to pass verify -Prelease
    Add running on the DirectRunner documentation

commit 2e47081f4c602c55b33ba783f014a8e1a8761acc
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-09T15:45:25Z

    Add comments on queries improvements and fix compilation config

commit e2a84c293dc56916f1ae4808de9efc961eac22f2
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-15T08:52:36Z

    Make NexmarkRunner generic and remove coupling with Google Dataflow
    
    issue #28

commit 1ce5fe901995482e1c678daac5070eef204af52d
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-15T14:25:41Z

    Activate monitoring on NexmarkSparkRunner
    issue #28

commit 319f7fc55e548a3ec0689e05e4bc0e48fff7964b
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-15T16:15:58Z

    Re-enable spark and flink in pom
    
    issue #28

commit 34061011945521528562cea0fd9ba5841ff6508f
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-15T16:34:31Z

    Activate monitoring on NexmarkFlinkRunner
    issue #28
    
    Fix compilation issue after rebase + make checkstyle happy again

commit b68fc71655783537749816e620f1bf697e8ed9a8
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-16T10:38:08Z

    Fix QueryTest
    
    Workaround for issue #22 + extra cleaning

commit 3d96de335e13f882d23d9519c48e24c4aabc4f25
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-16T14:57:18Z

    Replace junit asserts by hamcrest asserts
    Set numEvents in test to the minimum number that makes the tests pass
    issue #15
    
    comments, improve asserts (hamcrest), reformat
    
    For now make generate monothreaded

commit 939ff4fe9f8ca7182dd631c36a7d4725e1d06750
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-03-21T17:29:20Z

    Fix Apex driver and update execution matrix

commit b1f33655eebf49f988e8503d9a996b646880bd4c
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-03-23T18:32:45Z

    Refactor classes into packages
    
    The new hierarchy has logically based packages for:
    - drivers
    - io
    - model
    - queries
    - sources

commit 962919daf7956a474b1ccb917f4134bd0c4fc1f7
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-24T13:29:08Z

    Fix query5: Add comment on key lifting
    
    issue #30

commit 1a8f83ca3f6558791b412efe1c930bd65d66fe44
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-24T14:54:12Z

    Fix query10: Add comment for strange groupByKey
    
    issue #31

commit 3f6d62b175960789e34ce57b5ebc73022c358679
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-03-24T15:59:59Z

    Fix query11: Replace Count.perKey by Count.perElement
    
    issue #32

commit e8a6add06433c50d6bee6ca183f5b3dad64a817d
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-03-29T08:10:13Z

    Fix compile after ParDo refactor

commit 39242faeec54b3405605e05e7bfd204a0bf6d731
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-03T13:18:04Z

    Fix query3: Use GlobalWindow to comply with the State/Timer APIs
    
    Issue #7

commit 38d42d3b017d40096a777ea35e66c07155dcb591
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-03T14:48:08Z

    Fix query3: Use timer for personState expiration in GlobalWindow
    
    Issue #29

commit 5571b2f655b62ce1f630247612bcc3463ddee112
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-03T14:50:51Z

    Fix Runner categories in tests

commit 845b91a76693f17c3d10e934262956d4fdf1ba1e
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-03T16:26:17Z

    Fix query12: Replace Count.perKey by Count.perElement
    
    Issue #34

commit 4475649ab537fa8cfaaa08a6cdd1fe8d6943ec62
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-11T14:14:19Z

    Add streaming unit tests
     Issue #37

commit a6517859af59a2fc2b8ceb872a344b23803aa70f
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-12T09:13:58Z

    Add trigger to global windowing in query3. Adding labels to query tests

commit a09ebb56f5022b588aaf77a674cf37c481a3b5ef
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-12T14:19:57Z

    Update unit tests: results are no more linked to the number of events
    
    issue #22

commit 4a768b1089655bcddf2771862cd84c8eba5c8682
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-04-13T08:47:54Z

    Fix compile after PubsubIO refactor

commit 978ae6ba94c1224abf7b834b3fbae9d2b2e9b65a
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-04-13T09:07:50Z

    Change Nexmark pom structure to mirror other modules on Beam

commit 6bcebb78b1b062ebeb36a0bb2ae9d77824d02c09
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-14T15:13:59Z

    Fix Spark streaming termination via waitUntilFinish and timeout config
    
    issue #39

commit d788d6abb50574c978db8fa044a156a9cfcfdfa6
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-04-19T09:22:42Z

    Fix compile after sideOutput and split refactor

commit cac80dd461885d822c120481e7c33fcc38869356
Author: Ismaël Mejía <iemejia@apache.org>
Date:   2017-04-21T10:21:55Z

    Remove Accumulators and switch to the Metrics API

commit 9b874b886104f1db46f298c89c07dc32eaa88255
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-24T15:08:50Z

    Update execution matrix
    
    issue #45

commit 5fc854f972818848692faf960d60458e4a1a5727
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-28T08:29:38Z

    Fix compile after Coders and Pubsub refactor

----


> Port 'NexMark Queries' to Beam for use as integration test
> ----------------------------------------------------------
>
>                 Key: BEAM-160
>                 URL: https://issues.apache.org/jira/browse/BEAM-160
>             Project: Beam
>          Issue Type: Test
>          Components: testing
>            Reporter: Mark Shields
>            Assignee: Etienne Chauchot
>
> A while back we implemented the 'queries' from
>   http://datalab.cs.pdx.edu/niagara/NEXMark/
> as Gooogle Dataflow pipelines. We found them useful
> for uncovering performance problems with the sdk, our runners,
> and our service. Many of those problems only manifested under
> high load, multi-day runs, or with high 'backlog' on the incoming
> pub/sub subscriptions.
> We thus think they would be useful for other runners.
> Disclaimer: Though the original 'queries' were proposed as a way to
> benchmark 'continuous SQL' implementations, we have so far only
> used them for internal A/B and regression testing and have not validated
> them as representative of customer workloads. We would thus discourage their use for
competitive benchmarks without more work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message