hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiang hehui (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-13304) distributed database for store , mapreduce for compute
Date Tue, 21 Jun 2016 08:46:57 GMT
jiang hehui created HADOOP-13304:
------------------------------------

             Summary: distributed database for store , mapreduce for compute
                 Key: HADOOP-13304
                 URL: https://issues.apache.org/jira/browse/HADOOP-13304
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs
    Affects Versions: 2.6.4
            Reporter: jiang hehui


in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute .
my idea is that data are stored in distributed database , data compute is like mapreduce.

!http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!

* insert: 
using two-phase commit ,according to the split policy ,just execute insert in nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in nodes

* update:
using two-phase commit, according to the split policy, if record node does not change ,just
execute update in nodes, if record node change, first delete old value in source node , and
insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi nodes not need)is
just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across multi nodes),we
call a job 
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a job make up
dag( Directed Acyclic Graph ) ,every stage contains mapsql ,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan which contain
the dag , including stage and mapsql ,shuffle, reducesql in each stage; then just execute
the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , reducesql is
reduce.
{color}
{panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Mime
View raw message