incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <>
Subject [jira] [Commented] (HAMA-531) Data re-partitioning in BSPJobClient
Date Mon, 21 May 2012 13:16:41 GMT


Thomas Jungblut commented on HAMA-531:

Two possible approaches:

We schedule a BSP job to write to a given number of files,
OR we use the same logic like the graph repair that will take a first superstep to read all
the things and distribute it among the tasks afterwards.

I think that the last solution is quite simple.

bq.Does anyone know how it is done in Giraph?

Don't know, bet on the second solution, since their mapper input isn't very likely to be partitioned.

> Data re-partitioning in BSPJobClient
> ------------------------------------
>                 Key: HAMA-531
>                 URL:
>             Project: Hama
>          Issue Type: Improvement
>            Reporter: Edward J. Yoon
> The re-partitioning the data is a very expensive operation. By the way, currently, we
processes read/write operations sequentially using HDFS api in BSPJobClient from client-side.
This causes potential too many open files error, contains HDFS overheads, and shows slow performance.
> We have to find another way to re-partitioning data.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message