flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: MRQL on Flink
Date Thu, 28 Aug 2014 05:37:51 GMT
Very nice indeed! How well is this tested? Can it already run all the
example queries you have? Can you say anything about the performance
of the different underlying execution engines?

On Thu, Aug 28, 2014 at 12:58 AM, Stephan Ewen <sewen@apache.org> wrote:
> Wow, that is impressive!
>
>
> On Thu, Aug 28, 2014 at 12:06 AM, Ufuk Celebi <uce@apache.org> wrote:
>
>> Awesome, indeed! Looking forward to trying it out. :)
>>
>>
>> On Wed, Aug 27, 2014 at 10:52 PM, Sebastian Schelter <ssc@apache.org>
>> wrote:
>>
>> > Awesome!
>> >
>> >
>> > 2014-08-27 13:49 GMT-07:00 Leonidas Fegaras <fegaras@cse.uta.edu>:
>> >
>> > > Hello,
>> > > I would like to let you know that Apache MRQL can now run queries on
>> > Flink.
>> > > MRQL is a query processing and optimization system for large-scale,
>> > > distributed data analysis, built on top of Apache Hadoop/map-reduce,
>> > > Hama, Spark, and now Flink. MRQL queries are SQL-like but not SQL.
>> > > They can work on complex, user-defined data (such as JSON and XML) and
>> > > can express complex queries (such as pagerank and matrix
>> factorization).
>> > >
>> > > MRQL on Flink has been tested on local mode and on a small Yarn
>> cluster.
>> > >
>> > > Here are the directions on how to build the latest MRQL snapshot:
>> > >
>> > > git clone https://git-wip-us.apache.org/repos/asf/incubator-mrql.git
>> > mrql
>> > > cd mrql
>> > > mvn -Pyarn clean install
>> > >
>> > > To make it run on your cluster, edit conf/mrql-env.sh and set the
>> > > Java, the Hadoop, and the Flink installation directories.
>> > >
>> > > Here is how to run PageRank. First, you need to generate a random
>> > > graph and store it in a file using the MRQL query RMAT.mrql:
>> > >
>> > > bin/mrql.flink -local queries/RMAT.mrql 1000 10000
>> > >
>> > > This will create a graph with 1K nodes and 10K edges using the RMAT
>> > > algorithm, will remove duplicate edges, and will store the graph in
>> > > the binary file graph.bin. Then, run PageRank on Flink mode using:
>> > >
>> > > bin/mrql.flink -local queries/pagerank.mrql
>> > >
>> > > To run MRQL/Flink on a Yarn cluster, first start the Flink container
>> > > on Yarn by running the script yarn-session.sh, such as:
>> > >
>> > > ${FLINK_HOME}/bin/yarn-session.sh -n 8
>> > >
>> > > This will print the name of the Flink JobManager, which can be used in:
>> > >
>> > > export FLINK_MASTER=name-of-the-Flink-JobManager
>> > > bin/mrql.flink -dist -nodes 16 queries/RMAT.mrql 1000000 10000000
>> > >
>> > > This will create a graph with 1M nodes and 10M edges using RMAT on 16
>> > > nodes (slaves). You can adjust these numbers to fit your cluster.
>> > > Then, run PageRank using:
>> > >
>> > > bin/mrql.flink -dist -nodes 16 queries/pagerank.mrql
>> > >
>> > > The MRQL project page is at: http://mrql.incubator.apache.org/
>> > >
>> > > Let me know if you have any questions.
>> > > Leonidas Fegaras
>> > >
>> > >
>> >
>>

Mime
View raw message