apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chinmay Kolhatkar <chin...@datatorrent.com>
Subject Re: High level API: Request for ideas
Date Thu, 24 Dec 2015 06:31:55 GMT
+1 for not coming up with our own APIs.

But on the other hand the functional programming paradigm is something that
people are familiar with.
To find a middle spot we can have the DAG object to support a builder
pattern.
Specially for streaming application, the builder patterns can describe the
step by step process to user of Apex.

Basically as follows:


*dag.addOperation("ReadLines", new LineReader())*
*      .**addOperation**("Split", new LineSplitter());*
*      .**addOperation**("Count", new Counter())*
*      .**addOperation**("Print", new ConsoleOutputOperator());*

This will translate to following:









*Reader reader = dag.addOperator("ReadLines", new
LineReader());LineSplitter splitter = dag.addOperator("Split", new
LineSplitter());WordCounter counter = dag.addOperator("Count", new
Counter());ConsoleOutputOperator console = dag.addOperator("Print",
new ConsoleOutputOperator());dag.addStream("lines", reader.output,
splitter.input);dag.addStream("words", splitter.output,
counter.input);dag.addStream("count", counter.output, console.input);*

Thoughts?


~ Chinmay.

On Thu, Dec 24, 2015 at 11:28 AM, Sandeep Deshmukh <sandeep@datatorrent.com>
wrote:

> +1 on not coming up with our own APIs. We should adapt to existing ones  so
> that there is additional learning curve for Apex users.
>
> It is more of a functional way of specifying the DAG and a subset of scala
> could be a good starting point.
>
> Regards,
> Sandeep
>
> On Thu, Dec 24, 2015 at 5:21 AM, Siyuan Hua <siyuan@datatorrent.com>
> wrote:
>
> > My first suggestion is we should focus on Stream API(or change the name
> we
> > call it) for now.  High-level API is confusion and could be anything that
> > helps.
> >
> > Stream is in fact more well-known concept other than Operators, ports,
> > connector, etc. I think the idea originate from scala sequence API(
> > http://www.scala-lang.org/api/current/#scala.collection.Seq)
> > And the term "Stream" already implies some minimal function we need ex.
> > "map"(t1->f->t1'), "reduce" (t1, t2,...  -> f -> t1'), "filter" (t1
> ->f(if
> > true) -> t1)
> > We shouldn't come up with arbitrary things so the API would become
> > cumbersome and hard to learn.
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 23, 2015 at 3:12 PM, Siyuan Hua <siyuan@datatorrent.com>
> > wrote:
> >
> > > Another API that could be a reference is
> > > http://storm.apache.org/documentation/Trident-API-Overview.html
> > >
> > > On Wed, Dec 23, 2015 at 3:09 PM, Ashwin Chandra Putta <
> > > ashwinchandrap@gmail.com> wrote:
> > >
> > >> // Made few edits, ignore previous mail. Read this instead.
> > >>
> > >> David,
> > >>
> > >> I can imagine that it boils down to something like these function
> calls.
> > >>
> > >> DtString lines = readLines(new LineReader());
> > >> DtString words = lines.split(new LineSplitter());
> > >> DtNumber count  = words.count(new Counter());
> > >> count.print(new ConoleOutputOperator());
> > >>
> > >> Or
> > >>
> > >> readLines(new LineReader())
> > >>   .split(new LineSplitter())
> > >>   .count(new Counter())
> > >>   .print(new ConoleOutputOperator());
> > >>
> > >>
> > >> which translates to
> > >>
> > >> Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> Splitter splitter = dag.addOperator("Split", new LineSplitter());
> > >> Counter counter = dag.addOperator("Count", new WordCounter());
> > >> ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> ConsoleOutputOperator());
> > >>
> > >> dag.addStream("lines", reader.output, splitter.input);
> > >> dag.addStream("words", splitter.output, counter.input);
> > >> dag.addStream("count", counter.output, console.input);
> > >>
> > >> Here are my initial thoughts:
> > >>
> > >> For the higher level api to work, we need the following support at
> > least.
> > >>
> > >> 1. The operators used in the higher level api should have concrete
> > >> implementations with all available input and output ports defined at
> the
> > >> abstract level. Have to think more about how multiple output ports
> will
> > >> play out.
> > >> 2. We need to define the objects that have method calls available on
> > them
> > >> that take operators as parameters.
> > >>
> > >> Eg: DtString can have method split and takes Splitter operator. And
> > >> Splitter operator should be abstract with input port type DtString and
> > >> output port type DtString. LineSplitter will be a concrete
> > implementation
> > >> of this operator.
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >> On Wed, Dec 23, 2015 at 3:07 PM, Ashwin Chandra Putta <
> > >> ashwinchandrap@gmail.com> wrote:
> > >>
> > >> > David,
> > >> >
> > >> > I can imagine that it boils down to something like these function
> > calls.
> > >> >
> > >> > DtString lines = readLines(new LineReader());
> > >> > DtString words = lines.split(new LineSplitter());
> > >> > DtNumber count  = words.count(new Counter());
> > >> > count.print(new ConoleOutputOperator());
> > >> >
> > >> > Or
> > >> >
> > >> > readLines(new LineReader())
> > >> >   .split(new LineSplitter())
> > >> >   .count(new Counter())
> > >> >   .print(new ConoleOutputOperator());
> > >> >
> > >> >
> > >> > which translates to
> > >> >
> > >> > Reader reader = dag.addOperator("ReadLines", new LineReader());
> > >> > LineSplitter splitter = dag.addOperator("Split", new
> LineSplitter());
> > >> > WordCounter counter = dag.addOperator("Count", new Counter());
> > >> > ConsoleOutputOperator console = dag.addOperator("Print", new
> > >> > ConsoleOutputOperator());
> > >> >
> > >> > dag.addStream("lines", reader.output, splitter.input);
> > >> > dag.addStream("words", splitter.output, counter.input);
> > >> > dag.addStream("count", counter.output, console.input);
> > >> >
> > >> > Here are my initial thoughts:
> > >> >
> > >> > For the higher level api to work, we need the following support at
> > >> least.
> > >> >
> > >> > 1. The operators used in the higher level api should have concrete
> > >> > implementations with all available input and output ports defined
at
> > the
> > >> > abstract level. Have to think more about how multiple output ports
> > will
> > >> > play out.
> > >> > 2. We need to define the objects that have method calls available
on
> > >> them
> > >> > that take operators as parameters.
> > >> >
> > >> > Eg: DtString can have method split and takes Splitter operator. And
> > >> > Splitter operator should be abstract with input port type DtString
> and
> > >> > output port type DtString. LineSplitter will be a concrete
> > >> implementation
> > >> > of this operator.
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >> > On Wed, Dec 23, 2015 at 1:42 PM, David Yan <david@datatorrent.com>
> > >> wrote:
> > >> >
> > >> >> Hi fellow Apex developers:
> > >> >>
> > >> >> Apex has a comprehensive API for constructing DAG topologies for
> > >> streaming
> > >> >> applications, using operators, ports and streams.  But this may
> seem
> > >> too
> > >> >> much for folks who just want to build simple applications, or
just
> to
> > >> >> learn
> > >> >> about Apex.  For example, when you compare the code to do word
> count
> > in
> > >> >> Apex with Spark Streaming or Flink, Apex requires much more code.
> > >> >>
> > >> >> Apex:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/incubator-apex-malhar/tree/master/demos/wordcount/src/main
> > >> >>
> > >> >> Spark Streaming:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java
> > >> >>
> > >> >> Flink:
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/apache/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/WordCount.java
> > >> >>
> > >> >> Note that their Scala versions are even simpler to use.
> > >> >>
> > >> >> The high-level requirements I have in mind is as follow:
> > >> >>
> > >> >> 1. A simple-to-use high-level API similar to what Spark Streaming
> and
> > >> >> Flink
> > >> >> have. And from the high-level API, the Apex engine will construct
> the
> > >> >> actual DAG topology at launch time.
> > >> >>
> > >> >> 2. The first language we will support is Java, but we will also
> want
> > to
> > >> >> support Scala and possibly Python at some point, so the high-level
> > API
> > >> >> should make it easy for implementing bindings for at least these
> two
> > >> >> languages.
> > >> >>
> > >> >> 3. We should be able to use the high-level API in Apex App Package
> > >> (apa)
> > >> >> file, so that dtcli can launch it just like a regular apa today.
> > >> >>
> > >> >> Please provide your ideas and thoughts on this topic.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> David
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Regards,
> > >> > Ashwin.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Regards,
> > >> Ashwin.
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message