hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Roelofs (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner
Date Tue, 08 Mar 2011 23:01:05 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Roelofs updated MAPREDUCE-1220:
------------------------------------

    Attachment: MR-1220.v1b.sshot-02-jobdetails.jsp.png

screenshot of jobdetails page (uber-job still running)

The "Job Scheduling information" line shows up again here, but the top table is also modified,
as is the title of the graph. Trivial stuff, but it provides a clue to the user in case the
optimization is less transparent than intended.

> Implement an in-cluster LocalJobRunner
> --------------------------------------
>
>                 Key: MAPREDUCE-1220
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: client, jobtracker
>            Reporter: Arun C Murthy
>            Assignee: Greg Roelofs
>         Attachments: MAPREDUCE-1220_yhadoop20.patch, MR-1220.v1.trunk-hadoop-common.Progress-dumper.patch.txt,
MR-1220.v10e-v11c-v12b.ytrunk-hadoop-mapreduce.delta.patch.txt, MR-1220.v13.ytrunk-hadoop-mapreduce.delta.patch.txt,
MR-1220.v14b.ytrunk-hadoop-mapreduce.delta.patch.txt, MR-1220.v15.ytrunk-hadoop-mapreduce.delta.patch.txt,
MR-1220.v1b.sshot-02-jobdetails.jsp.png, MR-1220.v2.trunk-hadoop-mapreduce.patch.txt, MR-1220.v2.trunk-hadoop-mapreduce.patch.txt,
MR-1220.v2b.sshot-01-jobtracker.jsp.png, MR-1220.v6.ytrunk-hadoop-mapreduce.patch.txt, MR-1220.v7.ytrunk-hadoop-mapreduce.delta.patch.txt,
MR-1220.v8b.ytrunk-hadoop-mapreduce.delta.patch.txt, MR-1220.v9c.ytrunk-hadoop-mapreduce.delta.patch.txt
>
>
> Currently very small map-reduce jobs suffer from latency issues due to overheads in Hadoop
Map-Reduce such as scheduling, jvm startup etc. We've periodically tried to optimize all parts
of framework to achieve lower latencies.
> I'd like to turn the problem around a little bit. I propose we allow very small jobs
to run as a single task job with multiple maps and reduces i.e. similar to our current implementation
of the LocalJobRunner. Thus, under certain conditions (maybe user-set configuration, or if
input data is small i.e. less a DFS blocksize) we could launch a special task which will run
all maps in a serial manner, followed by the reduces. This would really help small jobs achieve
significantly smaller latencies, thanks to lesser scheduling overhead, jvm startup, lack of
shuffle over the network etc. 
> This would be a huge benefit, especially on large clusters, to small Hive/Pig queries.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message