apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siyuan Hua <siy...@datatorrent.com>
Subject Re: High level API: Request for ideas
Date Wed, 23 Dec 2015 23:51:07 GMT
My first suggestion is we should focus on Stream API(or change the name we
call it) for now.  High-level API is confusion and could be anything that
helps.

Stream is in fact more well-known concept other than Operators, ports,
connector, etc. I think the idea originate from scala sequence API(
http://www.scala-lang.org/api/current/#scala.collection.Seq)
And the term "Stream" already implies some minimal function we need ex.
"map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1 ->f(if
true) -> t1)
We shouldn't come up with arbitrary things so the API would become
cumbersome and hard to learn.






On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <siyuan@datatorrent.com> wrote:

> Another API that could be a reference is
> http://storm.apache.org/documentation/Trident-API-Overview.html
>
> On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> ashwinchandrap@gmail.com> wrote:
>
>> // Made few edits, ignore previous mail. Read this instead.
>>
>> David,
>>
>> I can imagine that it boils down to something like these function calls.
>>
>> DtString lines = readLines(new LineReader());
>> DtString words = lines.split(new LineSplitter());
>> DtNumber count  = words.count(new Counter());
>> count.print(new ConoleOutputOperator());
>>
>> Or
>>
>> readLines(new LineReader())
>>   .split(new LineSplitter())
>>   .count(new Counter())
>>   .print(new ConoleOutputOperator());
>>
>>
>> which translates to
>>
>> Reader reader = dag.addOperator("ReadLines", new LineReader());
>> Splitter splitter = dag.addOperator("Split", new LineSplitter());
>> Counter counter = dag.addOperator("Count", new WordCounter());
>> ConsoleOutputOperator console = dag.addOperator("Print", new
>> ConsoleOutputOperator());
>>
>> dag.addStream("lines", reader.output, splitter.input);
>> dag.addStream("words", splitter.output, counter.input);
>> dag.addStream("count", counter.output, console.input);
>>
>> Here are my initial thoughts:
>>
>> For the higher level api to work, we need the following support at least.
>>
>> 1. The operators used in the higher level api should have concrete
>> implementations with all available input and output ports defined at the
>> abstract level. Have to think more about how multiple output ports will
>> play out.
>> 2. We need to define the objects that have method calls available on them
>> that take operators as parameters.
>>
>> Eg: DtString can have method split and takes Splitter operator. And
>> Splitter operator should be abstract with input port type DtString and
>> output port type DtString. LineSplitter will be a concrete implementation
>> of this operator.
>>
>> Regards,
>> Ashwin.
>>
>> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
>> ashwinchandrap@gmail.com> wrote:
>>
>> > David,
>> >
>> > I can imagine that it boils down to something like these function calls.
>> >
>> > DtString lines = readLines(new LineReader());
>> > DtString words = lines.split(new LineSplitter());
>> > DtNumber count  = words.count(new Counter());
>> > count.print(new ConoleOutputOperator());
>> >
>> > Or
>> >
>> > readLines(new LineReader())
>> >   .split(new LineSplitter())
>> >   .count(new Counter())
>> >   .print(new ConoleOutputOperator());
>> >
>> >
>> > which translates to
>> >
>> > Reader reader = dag.addOperator("ReadLines", new LineReader());
>> > LineSplitter splitter = dag.addOperator("Split", new LineSplitter());
>> > WordCounter counter = dag.addOperator("Count", new Counter());
>> > ConsoleOutputOperator console = dag.addOperator("Print", new
>> > ConsoleOutputOperator());
>> >
>> > dag.addStream("lines", reader.output, splitter.input);
>> > dag.addStream("words", splitter.output, counter.input);
>> > dag.addStream("count", counter.output, console.input);
>> >
>> > Here are my initial thoughts:
>> >
>> > For the higher level api to work, we need the following support at
>> least.
>> >
>> > 1. The operators used in the higher level api should have concrete
>> > implementations with all available input and output ports defined at the
>> > abstract level. Have to think more about how multiple output ports will
>> > play out.
>> > 2. We need to define the objects that have method calls available on
>> them
>> > that take operators as parameters.
>> >
>> > Eg: DtString can have method split and takes Splitter operator. And
>> > Splitter operator should be abstract with input port type DtString and
>> > output port type DtString. LineSplitter will be a concrete
>> implementation
>> > of this operator.
>> >
>> > Regards,
>> > Ashwin.
>> >
>> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <david@datatorrent.com>
>> wrote:
>> >
>> >> Hi fellow Apex developers:
>> >>
>> >> Apex has a comprehensive API for constructing DAG topologies for
>> streaming
>> >> applications, using operators, ports and streams.  But this may seem
>> too
>> >> much for folks who just want to build simple applications, or just to
>> >> learn
>> >> about Apex.  For example, when you compare the code to do word count in
>> >> Apex with Spark Streaming or Flink, Apex requires much more code.
>> >>
>> >> Apex:
>> >>
>> >>
>> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
>> >>
>> >> Spark Streaming:
>> >>
>> >>
>> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
>> >>
>> >> Flink:
>> >>
>> >>
>> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
>> >>
>> >> Note that their Scala versions are even simpler to use.
>> >>
>> >> The high-level requirements I have in mind is as follow:
>> >>
>> >> 1. A simple-to-use high-level API similar to what Spark Streaming and
>> >> Flink
>> >> have. And from the high-level API, the Apex engine will construct the
>> >> actual DAG topology at launch time.
>> >>
>> >> 2. The first language we will support is Java, but we will also want to
>> >> support Scala and possibly Python at some point, so the high-level API
>> >> should make it easy for implementing bindings for at least these two
>> >> languages.
>> >>
>> >> 3. We should be able to use the high-level API in Apex App Package
>> (apa)
>> >> file, so that dtcli can launch it just like a regular apa today.
>> >>
>> >> Please provide your ideas and thoughts on this topic.
>> >>
>> >> Thanks,
>> >>
>> >> David
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Regards,
>> > Ashwin.
>> >
>>
>>
>>
>> --
>>
>> Regards,
>> Ashwin.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message