incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Trivial Update of "MRQLProposal" by LeonidasFegaras
Date Tue, 05 Mar 2013 15:54:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "MRQLProposal" page has been changed by LeonidasFegaras:
http://wiki.apache.org/incubator/MRQLProposal?action=diff&rev1=16&rev2=17

Comment:
fixed typos

  
  = Background =
  
- The initial code was developed at the University of Texas of Arlington (UTA) by a research
team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal
of this project was to build a query processing system that translates SQL-like data analysis
queries to efficient workflows of !MapReduce jobs. A design goal was to use HDFS as the physical
storage layer, without any indexing, data partitioning, or data normalization, and to use
Hadoop (without extensions) as the run-time engine. The motivation behind this work was to
built a platform to test new ideas on query processing and optimization techniques applicable
to the !MapReduce framework.
+ The initial code was developed at the University of Texas of Arlington (UTA) by a research
team, led by Leonidas Fegaras. The software was first released in May 2011. The original goal
of this project was to build a query processing system that translates SQL-like data analysis
queries to efficient workflows of !MapReduce jobs. A design goal was to use HDFS as the physical
storage layer, without any indexing, data partitioning, or data normalization, and to use
Hadoop (without extensions) as the run-time engine. The motivation behind this work was to
build a platform to test new ideas on query processing and optimization techniques applicable
to the !MapReduce framework.
  
  A year ago, MRQL was extended to run on Hama. The motivation for this extension was that
Hadoop !MapReduce jobs were required to read their input and write their output on HDFS. This
simplifies reliability and fault tolerance but it imposes a high overhead to complex !MapReduce
workflows and graph algorithms, such as !PageRank, which require repetitive jobs. In addition,
Hadoop does not preserve data in memory across consecutive !MapReduce jobs. This restriction
requires to read data at every step, even when the data is constant. BSP, on the other hand,
does not suffer from this restriction, and, under certain circumstances, allows complex repetitive
algorithms to run entirely in the collective memory of a cluster. Thus, the goal was to be
able to run the same MRQL queries in both modes, !MapReduce and BSP, without modifying the
queries: If there are enough resources available, and low latency and speed are more important
than resilience, queries may run in BSP mode; otherwise, the same queries may run in !MapReduce
mode. BSP evaluation was found to be a good choice when fault tolerance is not critical, data
(both input and intermediate) can fit in the cluster memory, and data processing requires
complex/repetitive steps.
  
@@ -79, +79 @@

  We do not believe that this will be the case for MRQL for the years to come, because it
can be adapted to support new query languages, new optimization techniques, and new distributed
back-ends,
  thus sustaining enough research interest.
  Another risk is that, when graduate students who write code graduate, they may leave their
work undocumented and unfinished.
- We will strive to get enough momentum to recruit additional committers from industry in
order to eliminate these risks.
+ We will strive to gain enough momentum to recruit additional committers from industry in
order to eliminate these risks.
  
  == Inexperience with Open Source ==
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message