hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix
Date Tue, 16 Mar 2010 07:22:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845752#action_12845752
] 

Hong Tang commented on MAPREDUCE-1594:
--------------------------------------

- Overall, the approach looks fine although the implementation is still a bit hacky in the
sense that users still need to specify input/output directories for running sleep jobs (which
should be ignored). But I am fine with it for now as the structure is likely evolving with
more extensions to be added to Gridmix.
- Style comment: Consider change JobType to JobCreator - it sounds more natural to call JobCreator.createGridmixJob(...).
- Structure wise, it would be better to rename GridmixJob to LoadJob, and create a common
base (probably should be abstract) class for LoadJob and SleepJob and call it GridmixJob that
*only* contains the shared parts of LoadJob and SleepJob. E.g. outdir may only belong to LoadJob
but not SleepJob. (BTW, are File{Input,Output}Format.set{Input,Output}Path needed for SleepJob.call()?)
- SleepInputFormat.createRecordReader - should return a record reader that produces consecutive
keys that match the expected wakeup time for the mapper process. Something like the following:
{noformat}
      return new RecordReader<LongWritable, LongWritable>() {
        long start = -1;
        long slept = 0L;
        long sleep = 0L;
        final LongWritable key = new LongWritable();
        final LongWritable val = new LongWritable();

        @Override
        public boolean nextKeyValue() throws IOException {
          if (start == -1) {
            start = System.nanoTime()/1000000;
          }
          slept += sleep;
          sleep = Math.min(duration - slept, RINTERVAL);
          key.set(slept + sleep + start);
          val.set(duration - slept);
          return slept < duration;
        }

        @Override
        public float getProgress() throws IOException {
          return slept / ((float) duration);
        }

        @Override
        public LongWritable getCurrentKey() {
          return key;
        }

        @Override
        public LongWritable getCurrentValue() {
          return val;
        }

        @Override
        public void close() throws IOException {
          final String msg = "Slept for " + duration;
          LOG.info(msg);
        }

        public void initialize(InputSplit split, TaskAttemptContext ctxt) {
        }
      };
{noformat}

Accordingly, SleepMapper.map(...) should be modified as follows:
{noformat}
    public void map(LongWritable key, LongWritable value, Context context)
      throws IOException, InterruptedException {
      context.setStatus("Sleeping... " + value.get() + " ms left");
      long now = System.nanoTime()/1000000;
      if (now < key.get()) {
        TimeUnit.MILLISECONDS.sleep(key.get()-now);
      }
    }
{noformat}
This is to avoid the actual sleep time deviates from the expected sleep time and the error
gets accumulated over many map() calls.
- Similar idea should be applied to SleepReducer too.
- Do I read it right that by default the mapper() updates its progress once every 10 seconds?
It is more interesting to make RINTERVAL == Math.min(1sec, totalDuration/20) so that the reported
map task progress could be smoother. (Unfortunately the reduce progress may not be very useful).
- Should issue a warning in SleepJob.getSuccessfulAttemptInfo() if no successful attempt is
found.
- SleepJob.buildSplits(): should not use InputStriper at all. At the end, you should set locations
 of SleepSplit to "new String[0]" instead of "striper.splitFor(inputDir, 512, 3).getLocations()"


> Support for Sleep Jobs in gridmix
> ---------------------------------
>
>                 Key: MAPREDUCE-1594
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: rahul k singh
>         Attachments: 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch
>
>
> Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message