hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-312) Connections should not be cached
Date Mon, 07 Aug 2006 18:36:19 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-312?page=all ]

Devaraj Das updated HADOOP-312:

               Status: Patch Available  (was: Open)
    Affects Version/s:     (was: 0.4.0)

This patch implements the following:
1) Caching of client - server connections is made optional. Defaults to no-caching.
2) If no-caching is true, clients will disconnect idle connections to a server after a configured
time. The idle time defaults to 1 second.

The performance hit in this case is that once in a while clients are not able to establish
a connection to a server (if the server is too busy to accept incoming connections). I have
seen this in the case of TaskTracker -> JobTracker protocol. It happens once in a while.
When it happens, the JobTracker assumes that the TaskTracker is lost and then there is a whole
set of reruns for the tasks that were running on this "lost" tasktracker. This slows down
the overall progress of the job. Of course, this also happens in the case where the connections
are cached but the difference is that the RPCs timeout as opposed to connect failing.

If the above doesn't happen, the performance figures with/without caching on a 370 node cluster
is nearly the same.

> Connections should not be cached
> --------------------------------
>                 Key: HADOOP-312
>                 URL: http://issues.apache.org/jira/browse/HADOOP-312
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>         Attachments: no_connection_caching.patch
> Servers and clients (client include datanodes, tasktrackers, DFSClients & tasks)
should not cache connections or maybe cache them for very short periods of time. Clients should
set up & tear down connections to the servers everytime they need to contact the servers
(including the heartbeats). If connection is cached, then reuse the existing connection for
a few subsequent transactions until the connection expires. The heartbeat interval should
be more so that many more clients (order of  tens of thousands) can be accomodated within
1 heartbeat interval.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message