Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Received-SPF: pass (athena.apache.org: domain of jkaluza@redhat.com designates
 209.132.183.28 as permitted sender)
Message-ID: <51DBC313.5000103@redhat.com>
Date: Tue, 09 Jul 2013 10:00:19 +0200
From: =?ISO-8859-15?Q?Jan_Kalu=B8a?= <jkaluza@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130514 Thunderbird/17.0.6
MIME-Version: 1.0
To: dev@httpd.apache.org
Subject: Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get
 unique ID
References: <51CAE356.9020109@redhat.com> <201307060158.15619.sf@sfritsch.de>
In-Reply-To: <201307060158.15619.sf@sfritsch.de>
Content-Type: multipart/mixed;
 boundary="------------070507070701020809040904"

This is a multi-part message in MIME format.
--------------070507070701020809040904
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit

On 07/06/2013 01:58 AM, Stefan Fritsch wrote:
> On Wednesday 26 June 2013, Jan Kalu�a wrote:
>> currently mod_unique_id uses apr_gethostname(...) and PID pair as a
>> base to generate unique ID. The way how it's implemented brings
>> some problems:
>>
>> 1. For IPv6-only hosts it uses low-order bits of IPv6 address as if
>> they were unique, which is wrong.
>>
>> 2. It relies on working DNS. It can happen that hostname does not
>> have the IP assigned during httpd start (for example during the
>> boot) and I think it is still valid use-case (without
>> mod_unique_id module loaded, httpd works well in this case).
>>
>> 3. It calls 1 second sleep to overcome possible usage of the same
>> PID after restart as the one used before the restart.
>>
>> If I'm right, we could fix the problems above by using
>> ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base
>> for unique ID generation. It would also make the code simpler. I
>> think the randomness generated by ap_random_insecure_bytes() is at
>> least the same as the one introduced by apr_gethostname() + pid
>> pair.
>>
>> The attached patch implements it by removing in_addr/pid fields
>> unique_id_rec struct and introduces new "root" field which is
>> initialized using ap_random_insecure_bytes() function and is used
>> as a base for unique IDs.
>
> I agree in principle, but I would prefer the ID length to not increase
> that much. You replace 8 bytes ip+pid with 20 bytes "root", which
> means 16 bytes increase in base64 format. The unique id is also used
> as a request log id and having an additional 16 bytes of prefix in the
> error log (if one uses that feature) does not really appeal to me.

Thanks for reviewing the patch.

I agree 20 bytes could be too much. I have changed my patch to have only 
10 bytes long root. I will check the Daniel's ideas mentioned in another 
mail in this thread and try to implement it, but if we are going to do 
it my way, I think this patch should be OK.

> But 20 bytes may be excessive, anyway: If one assumes 10000 servers
> over which the IDs needs to be unique, and uses a 8 byte "root" field,
> the probability for a collision is 3*10^(-12) if I did the math
> correctly. This still has to be multiplied with the number of restarts
> of the servers, so to be save we could use a 10 byte "root" field,
> giving a collision probability (per server restart) of 4*10^(-17),
> which should be ok.
>

Regards,
Jan Kaluza


--------------070507070701020809040904
Content-Type: text/x-patch;
 name="httpd-2.4.4-uniqueid-no-addr.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="httpd-2.4.4-uniqueid-no-addr.patch"

diff --git a/modules/metadata/mod_unique_id.c b/modules/metadata/mod_unique_id.c
index 8bd858f..8756055 100644
--- a/modules/metadata/mod_unique_id.c
+++ b/modules/metadata/mod_unique_id.c
@@ -31,14 +31,11 @@
 #include "http_log.h"
 #include "http_protocol.h"  /* for ap_hook_post_read_request */
 
-#if APR_HAVE_UNISTD_H
-#include <unistd.h>         /* for getpid() */
-#endif
+#define ROOT_SIZE 10
 
 typedef struct {
     unsigned int stamp;
-    unsigned int in_addr;
-    unsigned int pid;
+    char root[ROOT_SIZE];
     unsigned short counter;
     unsigned int thread_index;
 } unique_id_rec;
@@ -64,18 +61,13 @@ typedef struct {
  * gethostbyname (gethostname()) is unique across all the machines at the
  * "site".
  *
- * We also further assume that pids fit in 32-bits.  If something uses more
- * than 32-bits, the fix is trivial, but it requires the unrolled uuencoding
- * loop to be extended.  * A similar fix is needed to support multithreaded
- * servers, using a pid/tid combo.
- *
- * Together, the in_addr and pid are assumed to absolutely uniquely identify
+ * The root is assumed to absolutely uniquely identify
  * this one child from all other currently running children on all servers
  * (including this physical server if it is running multiple httpds) from each
  * other.
  *
  * The stamp and counter are used to distinguish all hits for a particular
- * (in_addr,pid) pair.  The stamp is updated using r->request_time,
+ * root.  The stamp is updated using r->request_time,
  * saving cpu cycles.  The counter is never reset, and is used to permit up to
  * 64k requests in a single second by a single child.
  *
@@ -92,7 +84,7 @@ typedef struct {
  * module change.
  *
  * It is highly desirable that identifiers exist for "eternity".  But future
- * needs (such as much faster webservers, moving to 64-bit pids, or moving to a
+ * needs (such as much faster webservers, or moving to a
  * multithreaded server) may dictate a need to change the contents of
  * unique_id_rec.  Such a future implementation should ensure that the first
  * field is still a time_t stamp.  By doing that, it is possible for a site to
@@ -116,8 +108,6 @@ typedef struct {
  * htonl/ntohl. Well, this shouldn't be a problem till year 2106.
  */
 
-static unsigned global_in_addr;
-
 /*
  * XXX: We should have a per-thread counter and not use cur_unique_id.counter
  * XXX: in all threads, because this is bad for performance on multi-processor
@@ -129,7 +119,7 @@ static unique_id_rec cur_unique_id;
 /*
  * Number of elements in the structure unique_id_rec.
  */
-#define UNIQUE_ID_REC_MAX 5
+#define UNIQUE_ID_REC_MAX 4
 
 static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX],
                       unique_id_rec_size[UNIQUE_ID_REC_MAX],
@@ -138,24 +128,17 @@ static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX],
 
 static int unique_id_global_init(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *main_server)
 {
-    char str[APRMAXHOSTLEN + 1];
-    apr_status_t rv;
-    char *ipaddrstr;
-    apr_sockaddr_t *sockaddr;
-
     /*
      * Calculate the sizes and offsets in cur_unique_id.
      */
     unique_id_rec_offset[0] = APR_OFFSETOF(unique_id_rec, stamp);
     unique_id_rec_size[0] = sizeof(cur_unique_id.stamp);
-    unique_id_rec_offset[1] = APR_OFFSETOF(unique_id_rec, in_addr);
-    unique_id_rec_size[1] = sizeof(cur_unique_id.in_addr);
-    unique_id_rec_offset[2] = APR_OFFSETOF(unique_id_rec, pid);
-    unique_id_rec_size[2] = sizeof(cur_unique_id.pid);
-    unique_id_rec_offset[3] = APR_OFFSETOF(unique_id_rec, counter);
-    unique_id_rec_size[3] = sizeof(cur_unique_id.counter);
-    unique_id_rec_offset[4] = APR_OFFSETOF(unique_id_rec, thread_index);
-    unique_id_rec_size[4] = sizeof(cur_unique_id.thread_index);
+    unique_id_rec_offset[1] = APR_OFFSETOF(unique_id_rec, root);
+    unique_id_rec_size[1] = sizeof(cur_unique_id.root);
+    unique_id_rec_offset[2] = APR_OFFSETOF(unique_id_rec, counter);
+    unique_id_rec_size[2] = sizeof(cur_unique_id.counter);
+    unique_id_rec_offset[3] = APR_OFFSETOF(unique_id_rec, thread_index);
+    unique_id_rec_size[3] = sizeof(cur_unique_id.thread_index);
     unique_id_rec_total_size = unique_id_rec_size[0] + unique_id_rec_size[1] +
                                unique_id_rec_size[2] + unique_id_rec_size[3] +
                                unique_id_rec_size[4];
@@ -165,86 +148,13 @@ static int unique_id_global_init(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *pt
      */
     unique_id_rec_size_uu = (unique_id_rec_total_size*8+5)/6;
 
-    /*
-     * Now get the global in_addr.  Note that it is not sufficient to use one
-     * of the addresses from the main_server, since those aren't as likely to
-     * be unique as the physical address of the machine
-     */
-    if ((rv = apr_gethostname(str, sizeof(str) - 1, p)) != APR_SUCCESS) {
-        ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01563)
-          "unable to find hostname of the server");
-        return HTTP_INTERNAL_SERVER_ERROR;
-    }
-
-    if ((rv = apr_sockaddr_info_get(&sockaddr, str, AF_INET, 0, 0, p)) == APR_SUCCESS) {
-        global_in_addr = sockaddr->sa.sin.sin_addr.s_addr;
-    }
-    else {
-        ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01564)
-                    "unable to find IPv4 address of \"%s\"", str);
-#if APR_HAVE_IPV6
-        if ((rv = apr_sockaddr_info_get(&sockaddr, str, AF_INET6, 0, 0, p)) == APR_SUCCESS) {
-            memcpy(&global_in_addr,
-                   (char *)sockaddr->ipaddr_ptr + sockaddr->ipaddr_len - sizeof(global_in_addr),
-                   sizeof(global_in_addr));
-            ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01565)
-                         "using low-order bits of IPv6 address "
-                         "as if they were unique");
-        }
-        else
-#endif
-        return HTTP_INTERNAL_SERVER_ERROR;
-    }
-
-    apr_sockaddr_ip_get(&ipaddrstr, sockaddr);
-    ap_log_error(APLOG_MARK, APLOG_INFO, 0, main_server, APLOGNO(01566) "using ip addr %s",
-                 ipaddrstr);
-
-    /*
-     * If the server is pummelled with restart requests we could possibly end
-     * up in a situation where we're starting again during the same second
-     * that has been used in previous identifiers.  Avoid that situation.
-     *
-     * In truth, for this to actually happen not only would it have to restart
-     * in the same second, but it would have to somehow get the same pids as
-     * one of the other servers that was running in that second. Which would
-     * mean a 64k wraparound on pids ... not very likely at all.
-     *
-     * But protecting against it is relatively cheap.  We just sleep into the
-     * next second.
-     */
-    apr_sleep(apr_time_from_sec(1) - apr_time_usec(apr_time_now()));
     return OK;
 }
 
 static void unique_id_child_init(apr_pool_t *p, server_rec *s)
 {
-    pid_t pid;
-
-    /*
-     * Note that we use the pid because it's possible that on the same
-     * physical machine there are multiple servers (i.e. using Listen). But
-     * it's guaranteed that none of them will share the same pids between
-     * children.
-     *
-     * XXX: for multithread this needs to use a pid/tid combo and probably
-     * needs to be expanded to 32 bits
-     */
-    pid = getpid();
-    cur_unique_id.pid = pid;
-
-    /*
-     * Test our assumption that the pid is 32-bits.  It's possible that
-     * 64-bit machines will declare pid_t to be 64 bits but only use 32
-     * of them.  It would have been really nice to test this during
-     * global_init ... but oh well.
-     */
-    if ((pid_t)cur_unique_id.pid != pid) {
-        ap_log_error(APLOG_MARK, APLOG_CRIT, 0, s, APLOGNO(01567)
-                    "oh no! pids are greater than 32-bits!  I'm broken!");
-    }
-
-    cur_unique_id.in_addr = global_in_addr;
+    ap_random_insecure_bytes(&cur_unique_id.root,
+                             sizeof(cur_unique_id.root));
 
     /*
      * If we use 0 as the initial counter we have a little less protection
@@ -253,13 +163,6 @@ static void unique_id_child_init(apr_pool_t *p, server_rec *s)
      */
     ap_random_insecure_bytes(&cur_unique_id.counter,
                              sizeof(cur_unique_id.counter));
-
-    /*
-     * We must always use network ordering for these bytes, so that
-     * identifiers are comparable between machines of different byte
-     * orderings.  Note in_addr is already in network order.
-     */
-    cur_unique_id.pid = htonl(cur_unique_id.pid);
 }
 
 /* NOTE: This is *NOT* the same encoding used by base64encode ... the last two
@@ -291,10 +194,8 @@ static const char *gen_unique_id(const request_rec *r)
     unsigned short counter;
     int i,j,k;
 
-    new_unique_id.in_addr = cur_unique_id.in_addr;
-    new_unique_id.pid = cur_unique_id.pid;
+    memcpy(&new_unique_id.root, &cur_unique_id.root, ROOT_SIZE);
     new_unique_id.counter = cur_unique_id.counter;
-
     new_unique_id.stamp = htonl((unsigned int)apr_time_sec(r->request_time));
     new_unique_id.thread_index = htonl((unsigned int)r->connection->id);
 

--------------070507070701020809040904--