incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin" <chl...@nuk.edu.tw>
Subject Re: Summary of problems with HAMA-413 and Discussion
Date Tue, 30 Aug 2011 03:35:53 GMT
From the jira log it shows that the committed patch lets bsp peer directly report status back
to master. An issue we may need to consider right now is `how can we determine if a groom
server fails?' With original mechanism we can allow groom server to manage tasks (bsp peer)
and master takes care of groom servers. For instance, if a groom server fails, a master can
reschedule all tasks specified on that groom server to other working one. With currently mechanism,
the master, in addition to monitor the activity of groom servers, also needs to deal with
bsp peer. Do we have some plans on this already? 

-----Original message-----
From:Edward J. Yoon <edwardyoon@apache.org>
To:hama-dev@incubator.apache.org <hama-dev@incubator.apache.org>
Date:Fri, 26 Aug 2011 15:11:56 +0900
Subject:Re: Summary of problems with HAMA-413 and Discussion

Okay.

Sent from my iPhone

On 2011. 8. 26., at 오후 2:49, "ChiaHung Lin" <chl501@nuk.edu.tw> wrote:

> The latest patch (HAMA_NEW.patch) for HAMA-413 seems still using bsp peer to report its
status back to master. 
> 
> +        umbilical.updateTaskStatusAndReport(taskid);
> 
> +  public void updateTaskStatusAndReport(TaskAttemptID taskid) {
> ...
> +    doReport(taskStatus);
> +  }
> 
> Is there any chance to revert back using a version that reports task status by GroomServer,
so we can discuss based on that version? Just to ensure that the following issues are not
the result derived from the code changed above. 
> 
> -----Original message-----
> From:Edward J. Yoon <edwardyoon@apache.org>
> To:hama-dev@incubator.apache.org
> Date:Thu, 25 Aug 2011 19:43:48 +0900
> Subject:Summary of problems with HAMA-413 and Discussion
> 
> Today, I tested all Hama examples on my cluster of 32 nodes, with 96
> tasks. Pi and Serialized Printing examples were working fine but
> 
> 1. Barrier Synchronizations are not working well (with a 'bench' example).
> 2. When an unexpected shutdown occurs, ZK nodes (which created by each
> BSPPeer) will not be deleted. There's no way to clean them up before
> reboot the server.
> 3. Graph examples are not working.
> 4. Too many reporting times between Groom and Master.
> 5. And, there are many code issues that can be improved.
> 
> 1, and 2 issues are already reported (See HAMA-387, HAMA-407). Some of
> 3, 4, and 5 issues are already started by ChiaHung Lin.
> 
> All issues around this should be fixed in HAMA-413? or, Should we just
> commit HAMA-413?
> 
> Thanks.
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon
> 
> 
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Mime
View raw message