hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ikhtiyor Ahmedov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-781) Setting partition split fails in local mode when file size is big and has a runtime partition (HashParitioner)
Date Thu, 25 Jul 2013 10:37:48 GMT

    [ https://issues.apache.org/jira/browse/HAMA-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719478#comment-13719478
] 

Ikhtiyor Ahmedov commented on HAMA-781:
---------------------------------------

file names in part of code you mentioned is same as input path, in my case ratings1M.dat

Reason so far:
in case different cases of number of tasks, file names are same.
difference is in condition {code:title=BSPJobClient.java_line_562}job.get("bsp.partitioning.runner.job")
== null{code}
numTask(2,3,5)==false && numTask(4)==true
why is "bsp.partitioning.runner.job" is null
this configuration is set only in one place BSPJobClient.java:partition() API:line 440
and this configuration set prevented by condition in line 411 
(numTasks != numSplits) == false and Constants.ENABLE_RUNTIME_PARTITIONING == false (this
configuration set to true by default in 2 files, TestPartitioning and GraphJob)
because of it, configuration setting is null and it affects 
condition
{code:title=BSPJobClient.java_line_560}
if (split.getClass().getName().equals(FileSplit.class.getName())
  && job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) != null
  && job.get("bsp.partitioning.runner.job") == null) {
  LOG.debug(((FileSplit) split).getPath().getName());
   String[] extractPartitionID = ((FileSplit) split).getPath().getName()
         .split("[-]");
   rawSplit.setPartitionID(Integer.parseInt(extractPartitionID[1]));
}
{code}
where 
{code:title=BSPJobClient.java_line_564}
// (FileSplit) split).getPath().getName() == ratings1M.dat
String[] extractPartitionID = ((FileSplit) split).getPath().getName()
         .split("[-]"); // == [ratings1M.dat] , 1 elem
   rawSplit.setPartitionID(Integer.parseInt(extractPartitionID[1])); // out of bound
{code}

code block affected this situation.
{code:title=BSPJobClient.java_line_411}
if ((numTasks > 0 && numTasks != numSplits)
          || (job.getConfiguration().getBoolean(
              Constants.ENABLE_RUNTIME_PARTITIONING, false) && job
              .getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) != null)) {

        if (numTasks == 0) {
          numTasks = numSplits;
        }

        HamaConfiguration conf = new HamaConfiguration(job.getConfiguration());

        conf.setInt(Constants.RUNTIME_DESIRED_PEERS_COUNT, numTasks);
        if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_DIR) != null) {
          conf.set(Constants.RUNTIME_PARTITIONING_DIR, job.getConfiguration()
              .get(Constants.RUNTIME_PARTITIONING_DIR));
        }
        if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_CLASS) != null) {
          conf.set(Constants.RUNTIME_PARTITIONING_CLASS,
              job.get(Constants.RUNTIME_PARTITIONING_CLASS));
        }
        BSPJob partitioningJob = new BSPJob(conf);
        LOG.debug("partitioningJob input: " + partitioningJob.get(Constants.JOB_INPUT_DIR));
        partitioningJob.setInputFormat(job.getInputFormat().getClass());
        partitioningJob.setInputKeyClass(job.getInputKeyClass());
        partitioningJob.setInputValueClass(job.getInputValueClass());
        partitioningJob.setOutputFormat(NullOutputFormat.class);
        partitioningJob.setOutputKeyClass(NullWritable.class);
        partitioningJob.setOutputValueClass(NullWritable.class);
        partitioningJob.setBspClass(PartitioningRunner.class);
        partitioningJob.set("bsp.partitioning.runner.job", "true");
        partitioningJob.getConfiguration().setBoolean(
            Constants.ENABLE_RUNTIME_PARTITIONING, false);
        partitioningJob.setOutputPath(partitionDir);

        boolean isPartitioned = false;
        try {
          isPartitioned = partitioningJob.waitForCompletion(true);
        } catch (InterruptedException e) {
          LOG.error("Interrupted partitioning run-time.", e);
        } catch (ClassNotFoundException e) {
          LOG.error("Class not found error partitioning run-time.", e);
        }

        if (isPartitioned) {
          if (job.getConfiguration().get(Constants.RUNTIME_PARTITIONING_DIR) != null) {
            job.setInputPath(new Path(conf
                .get(Constants.RUNTIME_PARTITIONING_DIR)));
          } else {
            job.setInputPath(partitionDir);
          }
          job.setBoolean("input.has.partitioned", true);
          job.setInputFormat(NonSplitSequenceFileInputFormat.class);
        } else {
          LOG.error("Error partitioning the input path.");
          throw new IOException("Runtime partition failed for the job.");
        }
      }
{code}

If we remove condition numTasks != numSplits seems everything is fine
                
> Setting partition split fails in local mode when file size is big and has a runtime partition
(HashParitioner)
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HAMA-781
>                 URL: https://issues.apache.org/jira/browse/HAMA-781
>             Project: Hama
>          Issue Type: Bug
>          Components: bsp core
>            Reporter: Ikhtiyor Ahmedov
>            Priority: Minor
>         Attachments: HAMA-781.patch
>
>
> when input partitioner set to HashPartitioner and file size is big in local mode; in
line 566 of BSPJobClient.java throws index out of bound exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message