hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-531) Data re-partitioning in BSPJobClient
Date Mon, 10 Dec 2012 07:55:20 GMT

     [ https://issues.apache.org/jira/browse/HAMA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Edward J. Yoon updated HAMA-531:

    Attachment: patch_v02.txt

This patch fixes unit tests except weighted graph example (SSSP). Once all done, I'll fix

My plan for partitioning input data is by using the BSP job. Each task processes a single
input data block and writes files into destination directory. Finally, merge files. Then,
the number of partitions can be specified by desired number.

> Data re-partitioning in BSPJobClient
> ------------------------------------
>                 Key: HAMA-531
>                 URL: https://issues.apache.org/jira/browse/HAMA-531
>             Project: Hama
>          Issue Type: Improvement
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>         Attachments: HAMA-531_1.patch, HAMA-531_2.patch, HAMA-531_final.patch, patch.txt,
> The re-partitioning the data is a very expensive operation. By the way, currently, we
processes read/write operations sequentially using HDFS api in BSPJobClient from client-side.
This causes potential too many open files error, contains HDFS overheads, and shows slow performance.
> We have to find another way to re-partitioning data.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message