flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Fumero <juan.jose.fumero.alfo...@oracle.com>
Subject Re: FastR-Flink: a new open source Truffle project
Date Mon, 26 Oct 2015 13:06:41 GMT
Hi Max, 
   yes, we started running some benchmarks, but still this is very
preliminary version. Concerning performance what I can tell is we have
very good speedups on shared memory compared to fastR. Concerning
cluster applications we do not have good speedups yet for big R
applications. We are studying/investigating the problem. 

General speaking, for each Flink thread, we build the AST which
represents the R code (user function) and then it will be specialized
(partial evaluation). After a while, the R code will be compiled to an
optimize machine code via Truffle+Graal. 

In the link you showed are the operations we support for the moment. We
plan to support more operations.


On Mon, 2015-10-26 at 13:43 +0100, Maximilian Michels wrote:
> Hi Juan,
> Thanks for sharing. This looks very promising. A great way for people
> who want to use R and Flink without compromising much on performance.
> I would be curious about some cluster performance tests. Have you run
> any yet?
> Best,
> max
> https://bitbucket.org/allr/fastr-flink/src/71cf3f264a1faf98182781c9cdabef1b677a3bc6/README_FASTR_FLINK.md
> Here are the supported operations from the Wiki:
> Operations supported
> FastR operation Description flink.localApply(input, func, scope=c(""))
> Blocking sapply function in R. It runs with local Flink configuration
> flink.remoteApply(input, func, scope=c("")) Blocking remote sapply.
> Remote enviroment flink.map(x, func, scope=c(""), local=FALSE)
> Non-blocking sapply. It allows to use pattern composition. If
> LOCAL=FALSE, it uses remote Flink configuration
> flink.connectToJobManager(ip) It establishes the IP for remote
> execution flink.setParallelism(nThreads) Set the number of
> threads/workers flink.readTextFile(path) Read HDFS file
> flink.writeAsText(path) Write file into HDFS flink.flatMap(x, func,
> scope=c("")) Non blocking flatMap in Flink flink.groupBy(object)
> GroupBy previous operation (object) flink.sum(object) Sum previous
> operation (object) flink.execute() It cosumes the operation pipeline
> defined flink.collect(object) It consumes the pipeline and return the
> R result flink.filter(object, func) Filter previous operation (only
> map supported) according to the function passed as argument. This
> operation is experimental.
> On Mon, Oct 26, 2015 at 1:09 PM, Juan Fumero
> <juan.jose.fumero.alfonso@oracle.com> wrote:
>         Hi everyone,
>            we have just published a new open source Truffle project,
>         FastR-Flink. It is available in
>         https://bitbucket.org/allr/fastr-flink
>         FastR is an implementation of the R language on top of Truffle
>         and Graal
>         [3] developed by Purdue University, Johannes Kepler University
>         and
>         Oracle Labs [1].
>         FastR-Flink is fork of FastR project which includes Apache
>         Flink [2]
>         connection, allowing to execute distributed stream and batch
>         data
>         processing applications from R programming language by using
>         FastR
>         +Graal. This is a prototype we have been working on in Oracle
>         Labs
>         during my internship.
>         See README_FASTR_FLINK in
>         https://bitbucket.org/allr/fastr-flink
>         for more information about how to compile and start running
>         FastR+Flink
>         applications. I also included a directory with a few examples.
>         Juan
>         [1] FastR: https://bitbucket.org/allr/fastr
>         [2] Apache Flink: https://flink.apache.org/
>         [3] Graal: http://openjdk.java.net/projects/graal/

View raw message