incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "MRQLProposal" by LeonidasFegaras
Date Thu, 28 Feb 2013 19:58:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "MRQLProposal" page has been changed by LeonidasFegaras:
http://wiki.apache.org/incubator/MRQLProposal?action=diff&rev1=6&rev2=7

Comment:
added more info

  = Abstract =
  
- MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale,
distributed data analysis, built on top of Apache Hadoop and Hama.
+ MRQL is a query processing and optimization system for large-scale, distributed data analysis,
built on top of Apache Hadoop and Hama.
- 
- MRQL (the MapReduce Query Language) is an SQL-like query language for large-scale data analysis
on a computer cluster. The MRQL query processing system can execute MRQL queries in two modes:
using the MapReduce framework on Apache Hadoop or using the Bulk Synchronous Parallel (BSP)
framework on Apache Hama. The MRQL query language is powerful enough to express most common
data analysis tasks over many forms of raw data, such as XML and JSON documents, binary files,
and line-oriented text documents with comma-separated values. In contrast to standard SQL,
MRQL supports a richer data model (including nested collections), arbitrary query nesting,
and user-defined types and functions. MRQL is more powerful than other current high-level
MapReduce languages, such as Hive and Pig Latin, since it can operate on more complex data
and supports more powerful query constructs, thus eliminating the need for using explicit
map-reduce code. With MRQL, users will be able to express complex data analysis tasks, such
as pagerank, k-means clustering, matrix factorization, etc, using declarative queries only,
while the MRQL query processing system will be able to compile these queries to efficient
Java code.
  
  = Proposal =
  
+ MRQL (pronounced ''miracle'') is a query processing and optimization system for large-scale,
distributed data analysis. MRQL (the Map-Reduce Query Language) is an SQL-like query language
for large-scale data analysis on a cluster of computers. The MRQL query processing system
can execute MRQL queries in two modes: in Map-Reduce mode on top of [[http://hama.apache.org/|Apache
Hadoop]] or in Bulk Synchronous Parallel (BSP) mode on top of [[http://hama.apache.org/|Apache
Hama]]. The MRQL query language is powerful enough to express most common data analysis tasks
over many forms of raw data, such as XML and JSON documents, binary files, and line-oriented
text documents with comma-separated values. MRQL is more powerful than other current high-level
Map-Reduce languages, such as Hive and Pig Latin, since it can operate on more complex data
and supports more powerful query constructs, thus eliminating the need for using explicit
Map-Reduce code. With MRQL, users will be able to express complex data analysis tasks, such
as pagerank, k-means clustering, matrix factorization, etc, using SQL-like queries exclusively,
while the MRQL query processing system will be able to compile these queries to efficient
Java code.
  
  = Background =
  
+ The initial code was developed at the University of Texas of Arlington the by a research
team, led by Leonidas Fegaras. The original goal was to build a query processing system that
translates SQL-like data analysis queries to efficient workflows of Map-Reduce jobs. Our goal
was to use HDFS as the physical storage layer, without any indexing, data partitioning, or
data normalization,
+ and to use Hadoop (without extensions) as the run-time engine.
  
  = Rationale =
  
+ = Initial Goals =
+ 
+ Some current goals include:
+ 
+  * apply MRQL to graph analysis problems, such as k-means clustering and pagerank
+ 
+  * apply MRQL to large-scale scientific analysis (develop general optimization techniques
that can apply to matrix multiplication, matrix factorization, etc)
+ 
+  * process additional data formats, such as [[http://avro.apache.org/|Avro]]
+ 
+  * map MRQL to additional distributed processing frameworks, such as [[http://spark-project.org/|Spark]]
and [[http://www.open-mpi.org/|OpenMPI]]
  
  = Current Status =
+ 
+ Currently, MRQL is in a beta release (version 0.8.10). It is built on top of Hadoop and
Hama (no extensions are needed).
+ It currently works on Hadoop up to 1.0.4 (but not on Yarn yet) and Hama 0.5.0.
+ It has only been tested on a small cluster of 20 nodes (80 cores).
  
  == Meritocracy ==
  
@@ -49, +65 @@

  == An Excessive Fascination with the Apache Brand ==
  
  
- 
  = Documentation =
  
- More information at:
+ Information about MRQL can be found at:
  [[http://lambda.uta.edu/mrql/|MRQL: an Optimization Framework for Map-Reduce Queries]]
  
  = Initial Source =
  
+ The initial MRQL code has been released as part of a research project developed at the University
of Texas at Arlington under the Apache 2.0 license for the past two years.
+ The source code is currently hosted on GitHub at:
- [[https://github.com/fegaras/mrql|https://github.com/fegaras/mrql]]
+ [[https://github.com/fegaras/mrql|https://github.com/fegaras/mrql]].
+ 
+ MRQL’s release artifact would consist of a single tarball of packaging and test code.
  
  = External Dependencies =
  
+ The MRQL source code is already licensed under the Apache License, Version 2.0. MRQL uses
JLine which is distributed under the BSD license.
  
  = Cryptography =
  
  Not applicable.
- 
  
  = Required Resources =
  
@@ -73, +92 @@

  
  == Subversion Directory ==
  
+ [[https://github.com/fegaras/mrql.git|https://github.com/fegaras/mrql.git]]
  
  == Issue Tracking ==
  
@@ -83, +103 @@

  = Initial Committers =
  
   * Leonidas Fegaras <fegaras AT cse DOT uta DOT edu>
+  * Upa Gupta <upa.gupta AT mavs DOT uta DOT edu>
   * Edward J. Yoon <edwardyoon AT apache DOT org>
   * Maqsood Alam <maqsoodalam AT hotmail DOT com>
   * John Hope <john.hope AT oracle DOT com>
@@ -92, +113 @@

  = Affiliations =
  
   * Leonidas Fegaras (University of Texas at Arlington)
+  * Upa Gupta (University of Texas at Arlington)
   * Edward J. Yoon (Oracle corp)
   * Maqsood Alam (Oracle corp)
   * John Hope (Oracle corp)

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message