hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Menon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-700) BSPPartitioner should be configurable to be optional and allow input format conversion
Date Tue, 15 Jan 2013 05:20:13 GMT

    [ https://issues.apache.org/jira/browse/HAMA-700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553497#comment-13553497

Suraj Menon commented on HAMA-700:

> 1)Why did you choose to introduce getPartitionId() method in PartitioningRunner.
Two reasons. 
- Partitioner is used in the partitioning job as well as the user submitted job. Depending
on the RecordConverter class, the input records for partitioner job is different than the

type of input records for user job. 
- In future, when we have scalable messaging and better scheduler, instead of starting a new
partitioner job, we can inject a partitioning superstep. For HAMA-561, when we have the partitions
to be not changed but only converted to Vertex, the partition id would be same as the peer

> 2) The goals of the patch :
- to provide means in bsp core that could be reused in graph module to do run-time partitioning
- to make the graph job independent of the user data input format.(TextInputFormat, SequenceFileFormat
I am sorry, but I am a little lost on the suggestion. We chose to implement run-time partitioning
in the partition runner because eventually they are both doing the same. 
I am already guilty of doubling the storage of vertices. We can consider intermediate stages
(that writes local files instead of HDFS) when we implement BSPPartitioner injected into the
execution in terms of task count specified for the job.

> 3) It is up for vote. Vertex.write and readFields uses it , we can use the vertex.runner.conf

> 4) Sure, I just used it from the previous version. if you make it generic, then you have
to specify the classes in the configuration of the job.

> 5) Used when the user wants to just convert the records but run with number of tasks
same as count of splits.

The direction we have taken in partitioning is further open for suggestions and vote. The
direct commit was because it was difficult for me to keep up the patch with commits on the
same issue. From the next patch I would be following upload patch, review and then commit.
Thanks for reviewing.
> BSPPartitioner should be configurable to be optional and allow input format conversion
> --------------------------------------------------------------------------------------
>                 Key: HAMA-700
>                 URL: https://issues.apache.org/jira/browse/HAMA-700
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.1
>         Attachments: HAMA-700.patch_Jan7, HAMA-700.patch.v2, HAMA-700-v1.patch
> There should be a provisioning for skipping the PartitionRunner if needed. Also we can
have a RecordConverter interface so that the PartitionRunner can read the input in any format
and create new splits. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message