hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-312) Connections should not be cached
Date Mon, 07 Aug 2006 20:07:14 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-312?page=comments#action_12426312 ] 
            
Devaraj Das commented on HADOOP-312:
------------------------------------

I agree with this. In the current code, there is a timeout of 10 minutes and only when a TaskTracker
is out of contact for this much amount of time does the JobTracker assume that the TaskTracker
is dead. Unfortunately, even with this large timeout, sometimes an unfortunate TaskTracker
cannot make it. Yes, the accept queue can be made longer but we will hit the problem sometime
later when we have more clients. So,  do you think, in addition to increasing the accept queue
size, it makes sense to have a two-way heartbeat here? That is, if a server doesn't receive
a heartbeat from a client and the expiry-timeout is about to expire, it schedules a heartbeat
to the client and probably invokes a GETSTATUS or some such method on the client and if that
method returns a valid response, it keeps the client alive for another expiry-timeout interval
and this goes on... We can also look at other approaches - some of them are outlined in hadoop-362.
By the way, the patch for hadoop-181 should handle the lost tracker problem but this kind
of a problem might turn up for any client-server interaction.

> Connections should not be cached
> --------------------------------
>
>                 Key: HADOOP-312
>                 URL: http://issues.apache.org/jira/browse/HADOOP-312
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>         Attachments: no_connection_caching.patch, no_connection_caching.patch
>
>
> Servers and clients (client include datanodes, tasktrackers, DFSClients & tasks)
should not cache connections or maybe cache them for very short periods of time. Clients should
set up & tear down connections to the servers everytime they need to contact the servers
(including the heartbeats). If connection is cached, then reuse the existing connection for
a few subsequent transactions until the connection expires. The heartbeat interval should
be more so that many more clients (order of  tens of thousands) can be accomodated within
1 heartbeat interval.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message