hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiang hehui (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-13304) distributed database for store , mapreduce for compute
Date Tue, 21 Jun 2016 08:46:57 GMT
jiang hehui created HADOOP-13304:

             Summary: distributed database for store , mapreduce for compute
                 Key: HADOOP-13304
                 URL: https://issues.apache.org/jira/browse/HADOOP-13304
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs
    Affects Versions: 2.6.4
            Reporter: jiang hehui

in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute .
my idea is that data are stored in distributed database , data compute is like mapreduce.


* insert: 
using two-phase commit ,according to the split policy ,just execute insert in nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in nodes

* update:
using two-phase commit, according to the split policy, if record node does not change ,just
execute update in nodes, if record node change, first delete old value in source node , and
insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi nodes not need)is
just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across multi nodes),we
call a job 
{color:red}job are parsed into stages , stages have lineage , all stages in a job make up
dag( Directed Acyclic Graph ) ,every stage contains mapsql ,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan which contain
the dag , including stage and mapsql ,shuffle, reducesql in each stage; then just execute
the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , reducesql is

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message