httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lu, Yingqi" <yingqi...@intel.com>
Subject RE: SO_REUSEPORT in the children processes
Date Tue, 11 Mar 2014 04:45:17 GMT
Hi Yann,

As we pointed out in our original discussion thread, we dropped the child process implementation
due to the kernel defect with changing the number of open sockets. Now, we quickly tested
this child process implementation (prefork) with our workload on a modern Xeon dual sockets
server and most recent 3.13.6 kernel again.

1. We do not see "connection reset" errors during the run (ramp up and steady stay) any more.
However, we noticed that our workload cannot ramp down and terminate on its own with this
child process implementation. This never happened before with either "out of box" httpd or
the parent process implementation. After manually force shutdown the workload, we saw these
"connection reset" errors again.

2. During the run, we noticed that there are tons of "read timed out" errors. These errors
not only happen when the system is highly utilized, it even happens when system is only 10%
utilized. The response time was high.

3. Compared to parent process implementation, we found child process implementation results
in significantly higher (up to 10X) response time (The read timed out errors are not counted
in the result) at different CPU utilization levels. At peak performance level, it has ~22%
less throughput with tons of "connection reset" errors in additional to "read timed out" errors.
Parent process implementation does not have errors.

We think the reason of above findings may be caused by: 1. Too many open sockets created by
the children processes; and/or 2. Parent process does not have control, or maybe 3. Kernel
defect is not fully addressed. On the other hand, the parent implementation keeps minimal
number of open sockets that takes advantage of SO_REUSEPORT and keeps the environment more
controllable.

We are currently modifying the code based on all the feedbacks from the community with the
original parent process implementation which also includes separating the original patch between
with and without SO_REUSEPORT support. This would make SO_REUSEPORT patch cleaner and simpler.

Thanks,
Yingqi (people at work also call me Lucy:))


From: Yann Ylavic [mailto:ylavic.dev@gmail.com]
Sent: Friday, March 07, 2014 9:07 AM
To: httpd
Subject: SO_REUSEPORT in the children processes

Hi all,
the patch about SO_REUSEPORT proposed by Yingqi Lu in [1] and discussed in [2] uses listeners
buckets to address a defect [3] in the current linux implementation (his patch goes beyond
SO_REUSEPORT though, and suggests a new MPM even when the option is not available).
Should this defect be addressed by linux folks, the event/worker/prefork MPMs could take full
advantage of the option (linux-3.9+) with quite simple modifications of the current code.
I'm proposing here the corresponding patch.

The idea is to re-create and re-bind/listen the parent's listeners sockets for each child
process, when starting, before dropping priviledges.
For this, the patch introduces a new ap_reopen_listeners() function (meant to be called by
each child) to do the job on the inherited listeners. It does nothing unless HAVE_SO_REUSEPORT
is defined.

The advantage of this approach is that no accept mutex is needed anymore (each child has its
own sockets), hence the SAFE_ACCEPT macros can do nothing when HAVE_SO_REUSEPORT is defined.
The new (incoming) connections are evenly distributed accross the children for all the listners
(as assured by Linux when SO_REUSEPORT is used).
I'm proposing the patch here so that everyone can figure out whether SO_REUSEPORT per se needs
its own MPM or not (once/if the defect is fixed).
The option seems to be quite easily pluggable to existing MPMs (no ABI change), and I don't
see an advantage to not using it when available (and working).

Also, FreeBSD has an implementation of SO_REUSEPORT,
however
I couldn't find whether it has the same scheduling garantee
or not
(at least I guess the accept mutex can be avoided too).

Regarding the linux kernel defect, is someone aware of a fix/work on that in the latest versions?

Finally, about the accept mutex, mpm event seems to work well without any, why prefork and
worker would need one (both poll() all the listeners in a single thread, while other children
can do the same)?
The patch follows and is attached.
It can already be tested with a workaround against the defect: don't let perform_idle_server_maintenance()
create/stop children after startup (eg. StartServers, ServerLimit, Min/MaxSpareServers using
the same value).

Thoughts, feedbacks welcome.

Regards,
Yann.

[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=55897#c7
[2] http://mail-archives.apache.org/mod_mbox/httpd-bugs/201312.mbox/%3Cbug-55897-7868@https.issues.apache.org/bugzilla/%3E
[3] http://lwn.net/Articles/542629/ and http://lwn.net/Articles/542738/

Index: server/mpm/event/event.c
===================================================================
--- server/mpm/event/event.c    (revision 1575322)
+++ server/mpm/event/event.c    (working copy)
@@ -2356,6 +2356,13 @@ static void child_main(int child_num_arg)
     /*stuff to do before we switch id's, so we have permissions. */
     ap_reopen_scoreboard(pchild, NULL, 0);

+    rv = ap_reopen_listeners(pchild, num_listensocks);
+    if (rv != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     if (ap_run_drop_privileges(pchild, ap_server_conf)) {
         clean_child_exit(APEXIT_CHILDFATAL);
     }
Index: server/mpm/prefork/prefork.c
===================================================================
--- server/mpm/prefork/prefork.c    (revision 1575322)
+++ server/mpm/prefork/prefork.c    (working copy)
@@ -271,7 +271,9 @@ static void accept_mutex_off(void)
  * multiple Listen statements.  Define SINGLE_LISTEN_UNSERIALIZED_ACCEPT
  * when it's safe in the single Listen case.
  */
-#ifdef SINGLE_LISTEN_UNSERIALIZED_ACCEPT
+#if HAVE_SO_REUSEPORT
+#define SAFE_ACCEPT(stmt)
+#elif defined(SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
 #define SAFE_ACCEPT(stmt) do {if (ap_listeners->next) {stmt;}} while(0)
 #else
 #define SAFE_ACCEPT(stmt) do {stmt;} while(0)
@@ -536,6 +538,13 @@ static void child_main(int child_num_arg)
         clean_child_exit(APEXIT_CHILDFATAL);
     }

+    status = ap_reopen_listeners(pchild, num_listensocks);
+    if (status != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, status, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     if (ap_run_drop_privileges(pchild, ap_server_conf)) {
         clean_child_exit(APEXIT_CHILDFATAL);
     }
Index: server/mpm/worker/worker.c
===================================================================
--- server/mpm/worker/worker.c    (revision 1575322)
+++ server/mpm/worker/worker.c    (working copy)
@@ -220,7 +220,9 @@ static apr_os_thread_t *listener_os_thread;
 /* Locks for accept serialization */
 static apr_proc_mutex_t *accept_mutex;

-#ifdef SINGLE_LISTEN_UNSERIALIZED_ACCEPT
+#if HAVE_SO_REUSEPORT
+#define SAFE_ACCEPT(stmt)
+#elif defined(SINGLE_LISTEN_UNSERIALIZED_ACCEPT)
 #define SAFE_ACCEPT(stmt) (ap_listeners->next ? (stmt) : APR_SUCCESS)
 #else
 #define SAFE_ACCEPT(stmt) (stmt)
@@ -1228,6 +1230,13 @@ static void child_main(int child_num_arg)
     /*stuff to do before we switch id's, so we have permissions.*/
     ap_reopen_scoreboard(pchild, NULL, 0);

+    rv = ap_reopen_listeners(pchild, num_listensocks);
+    if (rv != APR_SUCCESS) {
+        ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf, APLOGNO()
+                     "Couldn't re-open listeners");
+        clean_child_exit(APEXIT_CHILDFATAL);
+    }
+
     rv = SAFE_ACCEPT(apr_proc_mutex_child_init(&accept_mutex,
                                                apr_proc_mutex_lockfile(accept_mutex),
                                                pchild));
Index: server/listen.c
===================================================================
--- server/listen.c    (revision 1575322)
+++ server/listen.c    (working copy)
@@ -60,6 +60,9 @@ static apr_status_t make_sock(apr_pool_t *p, ap_li
 #endif
     apr_status_t stat;

+    /* until further notice */
+    server->sd = NULL;
+
 #ifndef WIN32
     stat = apr_socket_opt_set(s, APR_SO_REUSEADDR, one);
     if (stat != APR_SUCCESS && stat != APR_ENOTIMPL) {
@@ -125,6 +128,23 @@ static apr_status_t make_sock(apr_pool_t *p, ap_li
 #endif

     if (do_bind_listen) {
+#if HAVE_SO_REUSEPORT
+        {
+            int yes = 1;
+            apr_os_sock_t sd = -1;
+            apr_os_sock_get(&sd, s);
+            if (setsockopt(sd, SOL_SOCKET, SO_REUSEPORT,
+                           (void *)&yes, sizeof(int)) == -1) {
+                stat = errno;
+                ap_log_perror(APLOG_MARK, APLOG_CRIT, stat, p, APLOGNO()
+                              "make_sock: for address %pI, setsockopt: "
+                              "(SO_REUSEPORT)", server->bind_addr);
+                apr_socket_close(s);
+                return stat;
+            }
+        }
+#endif
+
 #if APR_HAVE_IPV6
         if (server->bind_addr->family == APR_INET6) {
             stat = apr_socket_opt_set(s, APR_IPV6_V6ONLY, v6only_setting);
@@ -727,6 +747,34 @@ AP_DECLARE(int) ap_setup_listeners(server_rec *s)
     return num_listeners;
 }

+AP_DECLARE(apr_status_t) ap_reopen_listeners(apr_pool_t *p, int num)
+{
+#if HAVE_SO_REUSEPORT
+    ap_listen_rec *lr;
+
+    for (lr = ap_listeners; lr && num; lr = lr->next, --num) {
+        apr_socket_t *sd = lr->sd, *s;
+
+        rv = apr_socket_create(&s, lr->bind_addr->family, SOCK_STREAM, 0, p);
+        if (rv != APR_SUCCESS) {
+            return rv;
+        }
+        lr->sd = s;
+
+        rv = make_sock(p, lr, 1);
+        if (rv != APR_SUCCESS) {
+            /* make_sock() has closed lr->sd,
+             * restore the original socket */
+            lr->sd = sd;
+            return rv;
+        }
+        /* not needed anymore */
+        apr_socket_close(sd);
+    }
+#endif
+    return APR_SUCCESS;
+}
+
 AP_DECLARE_NONSTD(void) ap_close_listeners(void)
 {
     ap_listen_rec *lr;
Index: include/ap_listen.h
===================================================================
--- include/ap_listen.h    (revision 1575322)
+++ include/ap_listen.h    (working copy)
@@ -92,6 +92,15 @@ AP_DECLARE(void) ap_listen_pre_config(void);
 AP_DECLARE(int) ap_setup_listeners(server_rec *s);

 /**
+ * Loop through the global ap_listen_rec list and reopen up to @a num sockets
+ * when applicable (eg. HAVE_SO_REUSEPORT to reuse ports).
+ * @param p The pool reopen the sockets on
+ * @param num The number of sockets to reopen (all if negative)
+ * @return APR_SUCCESS or an APR error
+ */
+AP_DECLARE(apr_status_t) ap_reopen_listeners(apr_pool_t *p, int num);
+
+/**
  * Loop through the global ap_listen_rec list and close each of the sockets.
  */
 AP_DECLARE_NONSTD(void) ap_close_listeners(void);
[EOS]


Mime
View raw message