hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-885) Reduce CPU usage on namenode: gettimeofday
Date Fri, 02 Feb 2007 01:43:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469638
] 

Raghu Angadi commented on HADOOP-885:
-------------------------------------


Should we have different patch for maxIdleTime change?

The main cost of keeping connections longer is the cost of poll() and iterating in cleanupConnections().
This cost will become more significant with larger number of clients and datanode. I would
suggest following changes

  1) cleanupConnections() should be called only once in few seconds. 
  2) we should start using epoll() added in JDK 1.5.10 (this is a java cmd line option)  http://java.sun.com/j2se/1.5.0/ReleaseNotes.html#150_10


With these we can set timeout much higher, may be 1 minute.



> Reduce CPU usage on namenode: gettimeofday
> ------------------------------------------
>
>                 Key: HADOOP-885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-885
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.10.1
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: gettime1.patch, WallClock.java
>
>
> On a 900 node idle cluster, the namenode spends about  20% of CPU. Most of this CPU is
spent processing pure heartbeats. No jobs are running on this cluster and all nodes are alive
and acting well.
> Of the total namenode CPU usage, about 12% is in usermode and about 70% is in kernel
mode! The question that natually arises is why is heartbeat processing taking so much time
in kernel mode?
> An strace of namenode reveals that a 20 second period has about 52000 syscalls with the
following breakup:
> gettimeofday  :       18000 calls
> accept             :          2655 calls
> close               :          2655 calls
> shutdown       :          2655 calls
> fcntl                  :          7965 calls
> read                 :          7965 calls
> futex                 :          5295 calls
> poll                   :          4894 calls
> A code inspection reveals that the code is doing multiple (about 5) calls to System.currentTimeMillis()
in processing a single request in the RPC.java and Server.java classes. This might mean that
there is a possibility of optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message