gearpump-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEARPUMP-109) Add Java example for WordCount in document.
Date Thu, 28 Apr 2016 01:21:12 GMT

    [ https://issues.apache.org/jira/browse/GEARPUMP-109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261348#comment-15261348
] 

ASF GitHub Bot commented on GEARPUMP-109:
-----------------------------------------

Github user manuzhang commented on a diff in the pull request:

    https://github.com/apache/incubator-gearpump/pull/6#discussion_r61362610
  
    --- Diff: docs/dev-write-1st-app.md ---
    @@ -14,23 +14,131 @@ Repository and library dependencies can be found at [Maven Setting](maven-settin
     ### IDE Setup (Optional)
     You can get your preferred IDE ready for Gearpump by following [this guide](dev-ide-setup.html).
     
    -### Define Processor(Task) class and Partitioner class
    +### Decide which language and API to use for writing 
    +Gearpump supports two level APIs:
     
    -An application is a Directed Acyclic Graph (DAG) of processors. In the wordcount example,
We will firstly define two processors `Split` and `Sum`, and then weave them together.
    +1. Low level API, which is more similar to Akka programming, operating on each event.
The API document can be found at [Low Level API Doc](http://www.gearpump.io/releases/latest/api/scala/index.html#io.gearpump.streaming.package).
    +
    +2. High level API (aka DSL), which is operating on streaming instead of individual event.
The API document can be found at [DSL API Doc](http://www.gearpump.io/releases/latest/api/scala/index.html#io.gearpump.streaming.dsl.package).
    +
    +And both APIs have their Java version and Scala version.
    +
    +So, before you writing our first Gearpump application, you need to decide which API to
use and which language to use. 
    +
    +## DSL version for Wordcount
    +
    +The easiest way to write your streaming application is to write it with Gearpump DSL.

    +Below will demostrate how to write WordCount application via Gearpump DSL.
    +
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1" >
    +
    +
    +```scala
    +/** WordCount with High level DSL */
    +object WordCount extends AkkaApp with ArgumentsParser {
    +
    +  override val options: Array[(String, CLIOption[Any])] = Array.empty
     
    -#### About message type
    +  override def main(akkaConf: Config, args: Array[String]): Unit = {
    +    val context = ClientContext(akkaConf)
    +    val app = StreamApp("dsl", context)
    +    val data = "This is a good start, bingo!! bingo!!"
     
    -User are allowed to send message of type Any(Accept any type except Null, Nothing and
Unit).
    +    //count for each word and output to log
    +    app.source(data.lines.toList, 1, "source").
    +      // word => (word, count)
    +      flatMap(line => line.split("[\\s]+")).map((_, 1)).
    +      // (word, count1), (word, count2) => (word, count1 + count2)
    +      groupByKey().sum.log
     
    +    val appId = context.submit(app)
    +    context.close()
    +  }
    +}
     ```
    -case class Message(msg: Any, timestamp: TimeStamp = Message.noTimeStamp)
    +
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +
    +```java
    +
    +/** Java version of WordCount with high level DSL API */
    +public class WordCount {
    +
    +  public static void main(String[] args) throws InterruptedException {
    +    main(ClusterConfig.defaultConfig(), args);
    +  }
    +
    +  public static void main(Config akkaConf, String[] args) throws InterruptedException
{
    +    ClientContext context = new ClientContext(akkaConf);
    +    JavaStreamApp app = new JavaStreamApp("JavaDSL", context, UserConfig.empty());
    +    List<String> source = Lists.newArrayList("This is a good start, bingo!! bingo!!");
    +
    +    //create a stream from the string list.
    +    JavaStream<String> sentence = app.source(source, 1, UserConfig.empty(), "source");
    +
    +    //tokenize the strings and create a new stream
    +    JavaStream<String> words = sentence.flatMap(new FlatMapFunction<String,
String>() {
    +      @Override
    +      public Iterator<String> apply(String s) {
    +        return Lists.newArrayList(s.split("\\s+")).iterator();
    +      }
    +    }, "flatMap");
    +
    +    //map each string as (string, 1) pair
    +    JavaStream<Tuple2<String, Integer>> ones = words.map(new MapFunction<String,
Tuple2<String, Integer>>() {
    +      @Override
    +      public Tuple2<String, Integer> apply(String s) {
    +        return new Tuple2<String, Integer>(s, 1);
    +      }
    +    }, "map");
    +
    +    //group by according to string
    +    JavaStream<Tuple2<String, Integer>> groupedOnes = ones.groupBy(new GroupByFunction<Tuple2<String,
Integer>, String>() {
    +      @Override
    +      public String apply(Tuple2<String, Integer> tuple) {
    +        return tuple._1();
    +      }
    +    }, 1, "groupBy");
    +
    +    //for each group, make the sum
    +    JavaStream<Tuple2<String, Integer>> wordcount = groupedOnes.reduce(new
ReduceFunction<Tuple2<String, Integer>>() {
    +      @Override
    +      public Tuple2<String, Integer> apply(Tuple2<String, Integer> t1, Tuple2<String,
Integer> t2) {
    +        return new Tuple2<String, Integer>(t1._1(), t1._2() + t2._2());
    +      }
    +    }, "reduce");
    +
    +    //output result using log
    +    wordcount.log();
    +
    +    app.run();
    +    context.close();
    +  }
    +}
     ```
     
    +</div>
    +
    +</div>
    +
    +## Low level API based Wordcount
    +
    +### Define Processor(Task) class and Partitioner class
    +
    +An application is a Directed Acyclic Graph (DAG) of processors. In the wordcount example,
We will firstly define two processors `Split` and `Sum`, and then weave them together.
    +
    +
     #### Split processor
     
    -In the Split processor, we simply split a predefined text (the content is simplified
for conciseness) and send out each split word to Sum.
    +In the `Split` processor, we simply split a predefined text (the content is simplified
for conciseness) and send out each split word to Sum.
    --- End diff --
    
    "Sum" => "`Sum`"


> Add Java example for WordCount in document.
> -------------------------------------------
>
>                 Key: GEARPUMP-109
>                 URL: https://issues.apache.org/jira/browse/GEARPUMP-109
>             Project: Apache Gearpump
>          Issue Type: Improvement
>          Components: doc
>    Affects Versions: 0.8.1
>            Reporter: Weihua Jiang
>            Assignee: Weihua Jiang
>            Priority: Minor
>
> Currently, in our document, we only have Scala version for wordcount. It is better to
have both Scala version and Java version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message