hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HAMA-560) Partitioning should be done in parallel
Date Fri, 20 Apr 2012 04:27:37 GMT

     [ https://issues.apache.org/jira/browse/HAMA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Edward J. Yoon resolved HAMA-560.

    Resolution: Duplicate

Duplicated with HAMA-531
> Partitioning should be done in parallel
> ---------------------------------------
>                 Key: HAMA-560
>                 URL: https://issues.apache.org/jira/browse/HAMA-560
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.4.0
>            Reporter: praveen sripati
> Currently partitioning happens in the node on which the job has been submitted in the
BSPJobClient#submitJobInternal(). The partitioning happens in sequence and this will be a
bottle neck as the input data size grows. With partitioning in parallel, the completion time
for the job also 
> Here are some of the options to evaluate
> - Multiple threads to do the partitioning in the BSPJobClient#partition(). This is an
easy fix, but the partitioning is still restricted to a single node. There might be problem
with simultanious writes to the same file.
> - Use MR to partition the data. To check if we can kick an MR job with BSPJobClient#partition()
to partition the input data. The # of reducers should be set to the # of bsp tasks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message