hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Owen O'Malley <o...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-639) task cleanup messages can get lost, causing task trackers to keep tasks forever
Date Wed, 22 Nov 2006 19:38:26 GMT

On Nov 22, 2006, at 9:00 AM, Nigel Daley wrote:

> Arun, the proposal looks good.  If the JT always gets a stale seqNo  
> from the TT (because of some unrecoverable problem in the TT), will  
> it send the saved response forever?  Or should there be some  
> maximum resends?

I think that if the SeqNo doesn't match, it shouldn't count for the  
10 minute task tracker timeout. So if a task tracker gets stuck, it  
will get lost in 10 minutes.

> Also, when the JT is resending a JTResponse, can it add or change  
> the list of actions?  Or do they need to be identical?

For a first pass I'd require that they be identical. If the actions  
change, you need to assign a new SeqNo and track both the old and new  
SeqNo. Furthermore, piling more work on a task tracker that is  
running behind doesn't sound like a good strategy.

> Is it possible that a TT can get the same JTResponse more than  
> once?  If so, does the TT need to recognize this?

No. The RPC framework and the fact that only one task will be sending  
heartbeats will prevent that.

-- Owen

View raw message