httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Pane <>
Subject Latest profile of performance bottlenecks in 2.0
Date Mon, 29 Oct 2001 00:34:17 GMT
Here is an updated list of the top 20 CPU-consuming functions
in 2.0.  As with the previous profiles that I've posted, this
one reflects that delivery of server-parsed HTML pages, as
measured by Quantify on Solaris.  There are two important
differences, however:
  1. The percentages listed here are of the program's total
     usr CPU time, not usr+sys.  (I'm ignoring system calls
     for now because 2.0 has been doing well in that area.)
  2. This profile is for an httpd with my directory_walk/location_walk
     "pre-merge" patch applied.  Without this patch, the profile
     would look a bit different, as mod_mime's dir-merge function
     would account for about 15% of the total usr CPU time.

The top 20:
 1. bndm                   23.76%
     No problem here; the bndm algorithm, used to identify
     "<!--#" tokens during mod_include parsing, seems to be near
     optimal for that application.  As other parts of the httpd
     get more efficient, this function should increase toward 100%
     of the non-syscall time.

 2. strcasecmp              7.77%
     Mostly from apr_table_get, apr_table_setn, and
     sort_overlap (used in the table overlap operations)...
     I think we've finally reached the point where the
     O(n) table scans are the top bottleneck in the code.

 3. strlen                  5.69%
     The biggest contributor to this one is directory_walk,
     which seems to be doing a lot more strlen calls in its
     latest version.  Other major callers are apr_pool_userdata_set
     (my recent patch to add a "_setn" variant will partially
     fix this one) and strdup calls within apr_filepath_merge and

 4. apr_file_read           3.10%
     Called mostly to read the config file during startup,
     so I'm not worried about it...

 5. memset                  2.14%
     Used in apr_pcalloc

 6. find_entry              2.07%
     This is part of the implementation of the apr_hash_t
     get/set functions, which in turn are used mostly in the
     pool userdata API.  If anybody can speed up this function
     (possibly by optimizing the hash computation?), it will
     be beneficial for APR apps in general.
 7. memchr                  1.72%
     Most of the calls to this are from core_input_filter.

 8. strchr                  1.70%
     directory_walk makes about 75% of the calls to strchr.

 9. apr_palloc              1.47%
     The three big callers of apr_palloc are:
      - apr_pcalloc (no obvious optimizations here)
      - apr_pstrdup (reducing the strdup calls, as described
          above in the discussion of strlen, will fix this)
      - apr_pool_cleanup_register (I think this is mostly
          from apr_pool_userdata_set; my userdata patch
          optimizes away some of the cleanup registration)

10. apr_lock_release        1.27%
11. apr_lock_acquire        1.27%
     Most of the calls to these are from apr_file_read--and
     therefore affect startup rather than request processing.

12. strcmp                  1.06%
     directory_walk, handle_include (in mod_include), and
     location_walk constitute most of the calls to strcmp.

13. tolower                 1.05%
     ap_add_any_filter and the ap_strcasestr call in

14. ap_directory_walk       1.02%
     I don't know where in directory_walk the bulk of this
     time is being spent...but it doesn't matter much, because
     things called from directory_walk represent much bigger
     opportunities for optimization.

15. qsort                   0.99%
     apr_table_overlap uses qsort.

16. apr_os_thread_current   0.96%
     This is called mostly from apr_lock_acquire/release.
     For the reasons noted above, it doesn't have a big effect
     on request processing.

17. apr_table_setn          0.90%

18. apr_vformatter          0.88%
     Optimizing away the apr_psprintf calls in ap_make_etag and
     ap_add_common_vars would substantially reduce the time spent
     in apr_vformatter.  Note that we're in the <1% category here,

19. strrchr                 0.86%

20. memcmp                  0.83%
     memcmp is called primarily by the apr_hash_t lookup functions.

View raw message