hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1295) We need a job trace manipulator to build gridmix runs.
Date Wed, 23 Dec 2009 02:59:29 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Douglas updated MAPREDUCE-1295:

    Status: Open  (was: Patch Available)

The test failures are known (MAPREDUCE-1311, MAPREDUCE-1312).

Only a few minor nits:
* The two types of DeskewedJobTraceReader constructors could be combined by adding a private/protected
cstr with a JobTraceReader formal
* Should the System.err messages in DJTR::nextJob be debug messages?
* The open/close idiom in {{run}} may be replaced by {{FileSystem::exists}}. Alternatively,
require the user to provide a clean directory and use sequential segment numbering
* The first person is a little disorienting in the debug log messages- which could use log4j
loggers- but whatever you prefer
* The {{deletees}} and similar accounting can be replace with {{FileSystem::deleteOnExit}}

> We need a job trace manipulator to build gridmix runs.
> ------------------------------------------------------
>                 Key: MAPREDUCE-1295
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1295
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: mapreduce-1295--2009-12-17.patch, mapreduce-1295--2009-12-21.patch,
mapreduce-1295--2009-12-22.patch, mapreduce-1297--2009-12-14.patch
> Rumen produces "job traces", which are JSON format files describing important aspects
of all jobs that are run [successfully or not] on a hadoop map/reduce cluster.  There are
two packages under development that will consume these trace files and produce actions in
that cluster or another cluster: gridmix3 [see jira MAPREDUCE-1124 ] and Mumak [a simulator
-- see MAPREDUCE-728 ].
> It would be useful to be able to do two things with job traces, so we can run experiments
using these two tools: change the duration, and change the density.  I would like to provide
a "folder", a tool that can wrap a long-duration execution trace to redistribute its jobs
over a shorter interval, and also change the density by duplicating or culling away jobs from
the folded combined job trace.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message