hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Xia <x...@attinteractive.com>
Subject Proposal
Date Tue, 13 Sep 2011 23:15:13 GMT
I am not sure this is the right place to post this. Please leave your comments or let me know
where I should post.

Thanks

Alex

$1 Purpose
    This document proposes some ideas to build a new framework on map/reducer.

$2 Issues addressed
    For ETL and analytic purpose, there are some issues in the current development process.
I will try to
address the following issues,
    a) Programming map/reducer is not very productive. It is even unnecessary for business
analysts.
    b) There is no job management system. We can use oozie but oozie is not a good job management
system.

$3 Proposal Highlights
    I propose a new framework interface and summarize highlights as following,
    a) The framework exposes a shell scripting interface and all programming will be done
in script.
    b) All data in interface are objects. The object exposes the key and value by method.
    c) The interface exposes SQL-like statement to manipulate the objects from input and write
to output.
    d) The interface provides job management functionality.

$4 Possible syntax
    The proposed interface is built on map/reducer. The data in system can be partitioned
into small data set
and processed as stream of objects. For example, to process the web log, we can partition
the traffic by
visitor ID and process data in parallel.

a) script
    The interface will have all scripting functionary, like control and loop. It may be built
on "BeanShell".

b) data manipulation
    The interface reads a stream of objects from low level framework, manipulate the object
one by one, and
write back a stream of objects to low level framework. It may also partition data into small
sets and
group data by key.

    The possible syntax looks like this and there are some examples in following section.
        select {}
        partition by {}
        group key {}
        group value {}

c) job control
    The interface will have job control functionary, like submitJob, runJob, listJob, and
DeleteJob.

$5 Examples
    I use the words counting example in map/reducer. This example will count the work occurrence
in document.

                setInput(indir)
                setOutput(outDir)
                select {
                                array = SYS.read().split(' ');
                                foreach i in array
                                {
                                                obj.setKey(i.getValue());
                                                obj.setValue(1);
                                                SYS.write(obj);
                                }
                }
                // partition by {}
                group key {
                                SYS.getObject().getKey();
                }
                group value {
        array = SYS.read();
                obj = array[0];
        obj.setValue(array.size);
                                SYS.write(obj);
                };

    This example submits two jobs and waits for them.

                define job1 { select ...};     // use select statement above
                define job2 { select ...};
                id1 = submitJob(job1);
                id2 = submitJob(job1);

                jobList.add(id1);
                jobList.add(id2);

                waitFor(jobList);

                if (id1.status() == "Complete")
                                println("complete");

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message