On Wed, Sep 28, 2011 at 3:35 PM, Brian Geffon <briangeffon@gmail.com> wrote:
I'm encountering something strange with ATS3.0.1 on Red Hat Enterprise Linux 6, using a vanilla build with no modules enabled and the default records.config /w zero entries in the remap file, ATS is idling at very high CPU (around 15-20%).

root     13509  0.0  0.0  58512  2392 ?        Ss   19:15   0:00 /usr/local/ats3.0.1plain//bin/traffic_cop
nobody   13511  0.1  0.0 480072 16632 ?        Sl   19:15   0:00 /usr/local/ats3.0.1plain/bin/traffic_manager
nobody   13521 ***16.9***  0.1 1584628 114224 ?      Sl   19:15   2:06 /usr/local/ats3.0.1plain/bin/traffic_server -M -A,7:X

So I used strace to try to determine what might be causing this, and here is what i've found:

[root@machine]# strace -c -p 13521
Process 13521 attached - interrupt to quit
^CProcess 13521 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    3.589451         796      4510           epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00    3.589451                  4510           total

It appears that it's entirely epoll_wait, and each call is taking 796 microseconds! So I have to concerns with this, first, why would epoll_wait take such a long amount of time, 796 microseconds seems like a long time, and more importantly, how could it possibly be called so frequently, does ATS use a short timeout when doing epoll_waits?

I would really appreciate any feedback regarding this, has anyone else experienced this? Is there anywhere else I might look to determine the cause of this? Could this be classified as _normal_ behavior?

It is working as designed. The epoll timeout is set to 0 or 10 msec on linux. Could there be very little or no load going through this instance? That would explain why you are only seeing epoll_wait. In that case there isn't really anything wrong, the process is just burning through epoll_waits looking for something to do when nothing is available. Once the process starts taking enough traffic, you will start to see user space and other kernel functions start to take cpu time.