beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ieme...@apache.org
Subject [1/2] beam git commit: [BEAM-2853] remove Nexmark README.md now that the doc is in the website
Date Wed, 08 Nov 2017 14:29:06 GMT
Repository: beam
Updated Branches:
  refs/heads/master e0166ceb6 -> 9c5454287


[BEAM-2853] remove Nexmark README.md now that the doc is in the website


Project: http://git-wip-us.apache.org/repos/asf/beam/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam/commit/f78f4559
Tree: http://git-wip-us.apache.org/repos/asf/beam/tree/f78f4559
Diff: http://git-wip-us.apache.org/repos/asf/beam/diff/f78f4559

Branch: refs/heads/master
Commit: f78f45596bf5f3a5c3fd0b12b74688e1beeb3c7a
Parents: e0166ce
Author: Etienne Chauchot <echauchot@gmail.com>
Authored: Wed Nov 8 11:23:13 2017 +0100
Committer: Etienne Chauchot <echauchot@gmail.com>
Committed: Wed Nov 8 11:23:13 2017 +0100

----------------------------------------------------------------------
 sdks/java/nexmark/README.md | 340 ---------------------------------------
 1 file changed, 340 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam/blob/f78f4559/sdks/java/nexmark/README.md
----------------------------------------------------------------------
diff --git a/sdks/java/nexmark/README.md b/sdks/java/nexmark/README.md
deleted file mode 100644
index f252943..0000000
--- a/sdks/java/nexmark/README.md
+++ /dev/null
@@ -1,340 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-# NEXMark test suite
-
-This is a suite of pipelines inspired by the 'continuous data stream'
-queries in [http://datalab.cs.pdx.edu/niagaraST/NEXMark/]
-(http://datalab.cs.pdx.edu/niagaraST/NEXMark/).
-
-These are multiple queries over a three entities model representing on online auction system:
-
- - **Person** represents a person submitting an item for auction and/or making a bid
-    on an auction.
- - **Auction** represents an item under auction.
- - **Bid** represents a bid for an item under auction.
-
-The queries exercise many aspects of Beam model:
-
-* **Query1**: What are the bid values in Euro's?
-  Illustrates a simple map.
-* **Query2**: What are the auctions with particular auction numbers?
-  Illustrates a simple filter.
-* **Query3**: Who is selling in particular US states?
-  Illustrates an incremental join (using per-key state and timer) and filter.
-* **Query4**: What is the average selling price for each auction
-  category?
-  Illustrates complex join (using custom window functions) and
-  aggregation.
-* **Query5**: Which auctions have seen the most bids in the last period?
-  Illustrates sliding windows and combiners.
-* **Query6**: What is the average selling price per seller for their
-  last 10 closed auctions.
-  Shares the same 'winning bids' core as for **Query4**, and
-  illustrates a specialized combiner.
-* **Query7**: What are the highest bids per period?
-  Deliberately implemented using a side input to illustrate fanout.
-* **Query8**: Who has entered the system and created an auction in
-  the last period?
-  Illustrates a simple join.
-
-We have augmented the original queries with five more:
-
-* **Query0**: Pass-through.
-  Allows us to measure the monitoring overhead.
-* **Query9**: Winning-bids.
-  A common sub-query shared by **Query4** and **Query6**.
-* **Query10**: Log all events to GCS files.
-  Illustrates windows with large side effects on firing.
-* **Query11**: How many bids did a user make in each session they
-  were active?
-  Illustrates session windows.
-* **Query12**: How many bids does a user make within a fixed
-  processing time limit?
-  Illustrates working in processing time in the Global window, as
-  compared with event time in non-Global windows for all the other
-  queries.
-
-We can specify the Beam runner to use with maven profiles, available profiles are:
-
-* direct-runner
-* spark-runner
-* flink-runner
-* apex-runner
-
-The runner must also be specified like in any other Beam pipeline using
-
-    --runner
-
-
-Test data is deterministically synthesized on demand. The test
-data may be synthesized in the same pipeline as the query itself,
-or may be published to Pubsub.
-
-The query results may be:
-
-* Published to Pubsub.
-* Written to text files as plain text.
-* Written to text files using an Avro encoding.
-* Send to BigQuery.
-* Discarded.
-
-# Configuration
-
-## Common configuration parameters
-
-Decide if batch or streaming:
-
-    --streaming=true
-
-Number of events generators
-
-    --numEventGenerators=4
-
-Run query N
-
-    --query=N
-
-## Available Suites
-The suite to run can be chosen using this configuration parameter:
-
-    --suite=SUITE
-
-Available suites are:
-* DEFAULT: Test default configuration with query 0.
-* SMOKE: Run the 12 default configurations.
-* STRESS: Like smoke but for 1m events.
-* FULL_THROTTLE: Like SMOKE but 100m events.
-
-   
-
-## Apex specific configuration
-
-    --manageResources=false --monitorJobs=false
-
-## Dataflow specific configuration
-
-    --manageResources=false --monitorJobs=true \
-    --enforceEncodability=false --enforceImmutability=false
-    --project=<your project> \
-    --zone=<your zone> \
-    --workerMachineType=n1-highmem-8 \
-    --stagingLocation=<a gs path for staging> \
-    --runner=DataflowRunner \
-    --tempLocation=gs://talend-imejia/nexmark/temp/ \
-    --stagingLocation=gs://talend-imejia/nexmark/temp/staging/ \
-    --filesToStage=target/beam-sdks-java-nexmark-2.1.0-SNAPSHOT.jar
-
-## Direct specific configuration
-
-    --manageResources=false --monitorJobs=true \
-    --enforceEncodability=false --enforceImmutability=false
-
-## Flink specific configuration
-
-    --manageResources=false --monitorJobs=true \
-    --flinkMaster=local --parallelism=#numcores
-
-## Spark specific configuration
-
-    --manageResources=false --monitorJobs=true \
-    --sparkMaster=local \
-    -Dspark.ui.enabled=false -DSPARK_LOCAL_IP=localhost -Dsun.io.serialization.extendedDebugInfo=true
-
-# Current Status
-
-Open issues are tracked [here](https://github.com../../../../../issues):
-
-## Batch / Synthetic / Local
-
-| Query | Direct | Spark                                                        | Flink 
                                                    | Apex                               
                         |
-| ----: | ------ | ------------------------------------------------------------ | ----------------------------------------------------------
| ------------------------------------------------------------ |
-|     0 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     1 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     2 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     3 | ok     | [BEAM-1035](https://issues.apache.org/jira/browse/BEAM-1035) | ok    
                                                    | [BEAM-1037](https://issues.apache.org/jira/browse/BEAM-1037)
|
-|     4 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     5 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     6 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     7 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     8 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     9 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    10 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    11 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    12 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-
-## Streaming / Synthetic / Local
-
-| Query | Direct | Spark                                                        | Flink 
                                                    | Apex                               
                         |
-| ----: | ------ | ------------------------------------------------------------ | ----------------------------------------------------------
| ------------------------------------------------------------ |
-|     0 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     1 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     2 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     3 | ok     | [BEAM-1035](https://issues.apache.org/jira/browse/BEAM-1035) | ok    
                                                    | [BEAM-1037](https://issues.apache.org/jira/browse/BEAM-1037)
|
-|     4 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     5 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     6 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     7 | ok     | [BEAM-2112](https://issues.apache.org/jira/browse/BEAM-2112) | ok    
                                                    | ok                                 
                         |
-|     8 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|     9 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    10 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    11 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-|    12 | ok     | ok                                                           | ok    
                                                    | ok                                 
                         |
-
-## Batch / Synthetic / Cluster
-
-TODO
-
-| Query | Dataflow                       | Spark                          | Flink       
                  | Apex                           |
-| ----: | ------------------------------ | ------------------------------ | ------------------------------
| ------------------------------ |
-|     0 |                                |                                |             
                  |                                |
-
-## Streaming / Synthetic / Cluster
-
-TODO
-
-| Query | Dataflow                       | Spark                          | Flink       
                  | Apex                           |
-| ----: | ------------------------------ | ------------------------------ | ------------------------------
| ------------------------------ |
-|     0 |                                |                                |             
                  |                                |
-
-# Running Nexmark
-
-## Running SMOKE suite on the DirectRunner (local)
-
-Batch Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner
--suite=SMOKE --streaming=false --manageResources=false --monitorJobs=true --enforceEncodability=true
--enforceImmutability=true"
-
-Streaming Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pdirect-runner -Dexec.args="--runner=DirectRunner
--suite=SMOKE --streaming=true --manageResources=false --monitorJobs=true --enforceEncodability=true
--enforceImmutability=true"
-
-
-## Running SMOKE suite on the SparkRunner (local)
-
-Batch Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner
--suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true"
-
-Streaming Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pspark-runner "-Dexec.args=--runner=SparkRunner
--suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true"
-
-
-## Running SMOKE suite on the FlinkRunner (local)
-
-Batch Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner
--suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=true
 --flinkMaster=local"
-
-Streaming Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Pflink-runner "-Dexec.args=--runner=FlinkRunner
--suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=true
 --flinkMaster=local"
-
-
-## Running SMOKE suite on the ApexRunner (local)
-
-Batch Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner
--suite=SMOKE --streamTimeout=60 --streaming=false --manageResources=false --monitorJobs=false"
-
-Streaming Mode
-
-    mvn exec:java -Dexec.mainClass=org.apache.beam.sdk.nexmark.Main -Papex-runner "-Dexec.args=--runner=ApexRunner
--suite=SMOKE --streamTimeout=60 --streaming=true --manageResources=false --monitorJobs=false"
-
-
-## Running SMOKE suite on Google Cloud Dataflow
-
-Building package
-
-    mvn clean package -Pdataflow-runner
-
-Submit to Google Dataflow service
-
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar \
-  org.apache.beam.sdk.nexmark.Main \
-  --runner=DataflowRunner
-  --project=<your project> \
-  --zone=<your zone> \
-  --workerMachineType=n1-highmem-8 \
-  --stagingLocation=<a gs path for staging> \
-  --streaming=true \
-  --sourceType=PUBSUB \
-  --pubSubMode=PUBLISH_ONLY \
-  --pubsubTopic=<an existing Pubsub topic> \
-  --resourceNameMode=VERBATIM \
-  --manageResources=false \
-  --monitorJobs=false \
-  --numEventGenerators=64 \
-  --numWorkers=16 \
-  --maxNumWorkers=16 \
-  --suite=SMOKE \
-  --firstEventRate=100000 \
-  --nextEventRate=100000 \
-  --ratePeriodSec=3600 \
-  --isRateLimited=true \
-  --avgPersonByteSize=500 \
-  --avgAuctionByteSize=500 \
-  --avgBidByteSize=500 \
-  --probDelayedEvent=0.000001 \
-  --occasionalDelaySec=3600 \
-  --numEvents=0 \
-  --useWallclockEventTime=true \
-  --usePubsubPublishTime=true \
-  --experiments=enable_custom_pubsub_sink
-```
-
-```
-java -cp sdks/java/nexmark/target/beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar \
-  org.apache.beam.sdk.nexmark.Main \
-  --runner=DataflowRunner
-  --project=<your project> \
-  --zone=<your zone> \
-  --workerMachineType=n1-highmem-8 \
-  --stagingLocation=<a gs path for staging> \
-  --streaming=true \
-  --sourceType=PUBSUB \
-  --pubSubMode=SUBSCRIBE_ONLY \
-  --pubsubSubscription=<an existing Pubsub subscription to above topic> \
-  --resourceNameMode=VERBATIM \
-  --manageResources=false \
-  --monitorJobs=false \
-  --numWorkers=64 \
-  --maxNumWorkers=64 \
-  --suite=SMOKE \
-  --usePubsubPublishTime=true \
-  --outputPath=<a gs path under which log files will be written> \
-  --windowSizeSec=600 \
-  --occasionalDelaySec=3600 \
-  --maxLogEvents=10000 \
-  --experiments=enable_custom_pubsub_source
-```
-
-## Running query 0 on a Spark cluster with yarn
-
-Building package
-
-    mvn clean package -Pspark-runner
-
-Submit to the cluster
-
-    spark-submit --master yarn-client --class org.apache.beam.sdk.nexmark.Main --driver-memory
512m --executor-memory 512m --executor-cores 1 beam-sdks-java-nexmark-bundled-2.1.0-SNAPSHOT.jar
--runner=SparkRunner --query=0 --streamTimeout=60 --streaming=false --manageResources=false
--monitorJobs=true
-


Mime
View raw message