hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-885) Reduce CPU usage on namenode: gettimeofday
Date Fri, 02 Feb 2007 18:19:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469827

Raghu Angadi commented on HADOOP-885:

Sine we poll on idle connections, there is cpu cost even for idle connections. In that sense,
increasing it to 4 sec would actually increase cpu cost on large clusters. Of course, I don't
have much justification for 1 sec either :).

Using epoll() is not platform dependent since it does not change Java API. Its just a cmd
line option. We could set this only on Linux in hadoop-env.sh. 

Also, we should call cleanupConnections()  only once or twice in maxIdleTime. 

Though changing it 4 sec is a very small change, it is different from getTime() change.  We
could open another Jira on this.

> Reduce CPU usage on namenode: gettimeofday
> ------------------------------------------
>                 Key: HADOOP-885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-885
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.10.1
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: gettime1.patch, WallClock.java
> On a 900 node idle cluster, the namenode spends about  20% of CPU. Most of this CPU is
spent processing pure heartbeats. No jobs are running on this cluster and all nodes are alive
and acting well.
> Of the total namenode CPU usage, about 12% is in usermode and about 70% is in kernel
mode! The question that natually arises is why is heartbeat processing taking so much time
in kernel mode?
> An strace of namenode reveals that a 20 second period has about 52000 syscalls with the
following breakup:
> gettimeofday  :       18000 calls
> accept             :          2655 calls
> close               :          2655 calls
> shutdown       :          2655 calls
> fcntl                  :          7965 calls
> read                 :          7965 calls
> futex                 :          5295 calls
> poll                   :          4894 calls
> A code inspection reveals that the code is doing multiple (about 5) calls to System.currentTimeMillis()
in processing a single request in the RPC.java and Server.java classes. This might mean that
there is a possibility of optimization.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message