hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Menon (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-543) Make best effort to start BSP Task on the host where the input split is located.
Date Mon, 02 Apr 2012 04:48:26 GMT

     [ https://issues.apache.org/jira/browse/HAMA-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Suraj Menon updated HAMA-543:

    Attachment: HAMA-543-locality.patch

Hi, this a comparatively dirty hack that I made over the current source code. I would like
someone to review this, especially because I have changed few things that were assumed to
be multi-threaded as single-threaded. 

While working on it, I realized that this won't necessarily improve the performance, because
the resource requirements for Hama is different from Hadoop. This change would move the mapper
tasks closer to the input as in Hadoop. But in case of Hama tasks continue running on that
machine throughout its lifetime. If in search of data-locality, the tasks get scheduled such
that the communication between the nodes are costlier than normal (e.g. tasks resident in
separate racks), then this change would degrade the performance. 

While discussing on the issue, Thomas and me felt that network topology information should
be more important for scheduling jobs than data locality for the first superstep. We felt
that HAMA-519 could be a good start for providing input for this. I see that this is already
scheduled for 0.6. I can provide the test-cases if we decide to push this in 0.5 release.

>From the patch, I would like to know if making a single TaskWorker schedule all tasks
is fine or not. This would be important in my future patches. So even if this patch is not
really important, I would appreciate if it is reviewed.
> Make best effort to start BSP Task on the host where the input split is located.
> --------------------------------------------------------------------------------
>                 Key: HAMA-543
>                 URL: https://issues.apache.org/jira/browse/HAMA-543
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.6.0
>            Reporter: Suraj Menon
>             Fix For: 0.6.0
>         Attachments: HAMA-543-locality.patch
> Currently, BSP Task is not scheduled on the host that has the input split stored by HDFS.
BSP Task scheduler should make an attempt to start the task on the host that has the input
split located on it. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message