Return-Path: X-Original-To: apmail-httpd-dev-archive@www.apache.org Delivered-To: apmail-httpd-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 49403FFDC for ; Tue, 9 Jul 2013 08:01:33 +0000 (UTC) Received: (qmail 33116 invoked by uid 500); 9 Jul 2013 08:01:31 -0000 Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 32986 invoked by uid 500); 9 Jul 2013 08:01:27 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 32973 invoked by uid 99); 9 Jul 2013 08:01:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 08:01:25 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jkaluza@redhat.com designates 209.132.183.28 as permitted sender) Received: from [209.132.183.28] (HELO mx1.redhat.com) (209.132.183.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Jul 2013 08:01:20 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r6980wU0022932 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 9 Jul 2013 04:00:59 -0400 Received: from [10.34.24.234] (dhcp-24-234.brq.redhat.com [10.34.24.234]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r6980v38023260 for ; Tue, 9 Jul 2013 04:00:57 -0400 Message-ID: <51DBC313.5000103@redhat.com> Date: Tue, 09 Jul 2013 10:00:19 +0200 From: =?ISO-8859-15?Q?Jan_Kalu=B8a?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6 MIME-Version: 1.0 To: dev@httpd.apache.org Subject: Re: [PATCH] mod_unique_id: use ap_random_insecure_bytes() to get unique ID References: <51CAE356.9020109@redhat.com> <201307060158.15619.sf@sfritsch.de> In-Reply-To: <201307060158.15619.sf@sfritsch.de> Content-Type: multipart/mixed; boundary="------------070507070701020809040904" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------070507070701020809040904 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit On 07/06/2013 01:58 AM, Stefan Fritsch wrote: > On Wednesday 26 June 2013, Jan Kalu�a wrote: >> currently mod_unique_id uses apr_gethostname(...) and PID pair as a >> base to generate unique ID. The way how it's implemented brings >> some problems: >> >> 1. For IPv6-only hosts it uses low-order bits of IPv6 address as if >> they were unique, which is wrong. >> >> 2. It relies on working DNS. It can happen that hostname does not >> have the IP assigned during httpd start (for example during the >> boot) and I think it is still valid use-case (without >> mod_unique_id module loaded, httpd works well in this case). >> >> 3. It calls 1 second sleep to overcome possible usage of the same >> PID after restart as the one used before the restart. >> >> If I'm right, we could fix the problems above by using >> ap_random_insecure_bytes instead of "in_addr"/"pid" pair as a base >> for unique ID generation. It would also make the code simpler. I >> think the randomness generated by ap_random_insecure_bytes() is at >> least the same as the one introduced by apr_gethostname() + pid >> pair. >> >> The attached patch implements it by removing in_addr/pid fields >> unique_id_rec struct and introduces new "root" field which is >> initialized using ap_random_insecure_bytes() function and is used >> as a base for unique IDs. > > I agree in principle, but I would prefer the ID length to not increase > that much. You replace 8 bytes ip+pid with 20 bytes "root", which > means 16 bytes increase in base64 format. The unique id is also used > as a request log id and having an additional 16 bytes of prefix in the > error log (if one uses that feature) does not really appeal to me. Thanks for reviewing the patch. I agree 20 bytes could be too much. I have changed my patch to have only 10 bytes long root. I will check the Daniel's ideas mentioned in another mail in this thread and try to implement it, but if we are going to do it my way, I think this patch should be OK. > But 20 bytes may be excessive, anyway: If one assumes 10000 servers > over which the IDs needs to be unique, and uses a 8 byte "root" field, > the probability for a collision is 3*10^(-12) if I did the math > correctly. This still has to be multiplied with the number of restarts > of the servers, so to be save we could use a 10 byte "root" field, > giving a collision probability (per server restart) of 4*10^(-17), > which should be ok. > Regards, Jan Kaluza --------------070507070701020809040904 Content-Type: text/x-patch; name="httpd-2.4.4-uniqueid-no-addr.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="httpd-2.4.4-uniqueid-no-addr.patch" diff --git a/modules/metadata/mod_unique_id.c b/modules/metadata/mod_unique_id.c index 8bd858f..8756055 100644 --- a/modules/metadata/mod_unique_id.c +++ b/modules/metadata/mod_unique_id.c @@ -31,14 +31,11 @@ #include "http_log.h" #include "http_protocol.h" /* for ap_hook_post_read_request */ -#if APR_HAVE_UNISTD_H -#include /* for getpid() */ -#endif +#define ROOT_SIZE 10 typedef struct { unsigned int stamp; - unsigned int in_addr; - unsigned int pid; + char root[ROOT_SIZE]; unsigned short counter; unsigned int thread_index; } unique_id_rec; @@ -64,18 +61,13 @@ typedef struct { * gethostbyname (gethostname()) is unique across all the machines at the * "site". * - * We also further assume that pids fit in 32-bits. If something uses more - * than 32-bits, the fix is trivial, but it requires the unrolled uuencoding - * loop to be extended. * A similar fix is needed to support multithreaded - * servers, using a pid/tid combo. - * - * Together, the in_addr and pid are assumed to absolutely uniquely identify + * The root is assumed to absolutely uniquely identify * this one child from all other currently running children on all servers * (including this physical server if it is running multiple httpds) from each * other. * * The stamp and counter are used to distinguish all hits for a particular - * (in_addr,pid) pair. The stamp is updated using r->request_time, + * root. The stamp is updated using r->request_time, * saving cpu cycles. The counter is never reset, and is used to permit up to * 64k requests in a single second by a single child. * @@ -92,7 +84,7 @@ typedef struct { * module change. * * It is highly desirable that identifiers exist for "eternity". But future - * needs (such as much faster webservers, moving to 64-bit pids, or moving to a + * needs (such as much faster webservers, or moving to a * multithreaded server) may dictate a need to change the contents of * unique_id_rec. Such a future implementation should ensure that the first * field is still a time_t stamp. By doing that, it is possible for a site to @@ -116,8 +108,6 @@ typedef struct { * htonl/ntohl. Well, this shouldn't be a problem till year 2106. */ -static unsigned global_in_addr; - /* * XXX: We should have a per-thread counter and not use cur_unique_id.counter * XXX: in all threads, because this is bad for performance on multi-processor @@ -129,7 +119,7 @@ static unique_id_rec cur_unique_id; /* * Number of elements in the structure unique_id_rec. */ -#define UNIQUE_ID_REC_MAX 5 +#define UNIQUE_ID_REC_MAX 4 static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX], unique_id_rec_size[UNIQUE_ID_REC_MAX], @@ -138,24 +128,17 @@ static unsigned short unique_id_rec_offset[UNIQUE_ID_REC_MAX], static int unique_id_global_init(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *ptemp, server_rec *main_server) { - char str[APRMAXHOSTLEN + 1]; - apr_status_t rv; - char *ipaddrstr; - apr_sockaddr_t *sockaddr; - /* * Calculate the sizes and offsets in cur_unique_id. */ unique_id_rec_offset[0] = APR_OFFSETOF(unique_id_rec, stamp); unique_id_rec_size[0] = sizeof(cur_unique_id.stamp); - unique_id_rec_offset[1] = APR_OFFSETOF(unique_id_rec, in_addr); - unique_id_rec_size[1] = sizeof(cur_unique_id.in_addr); - unique_id_rec_offset[2] = APR_OFFSETOF(unique_id_rec, pid); - unique_id_rec_size[2] = sizeof(cur_unique_id.pid); - unique_id_rec_offset[3] = APR_OFFSETOF(unique_id_rec, counter); - unique_id_rec_size[3] = sizeof(cur_unique_id.counter); - unique_id_rec_offset[4] = APR_OFFSETOF(unique_id_rec, thread_index); - unique_id_rec_size[4] = sizeof(cur_unique_id.thread_index); + unique_id_rec_offset[1] = APR_OFFSETOF(unique_id_rec, root); + unique_id_rec_size[1] = sizeof(cur_unique_id.root); + unique_id_rec_offset[2] = APR_OFFSETOF(unique_id_rec, counter); + unique_id_rec_size[2] = sizeof(cur_unique_id.counter); + unique_id_rec_offset[3] = APR_OFFSETOF(unique_id_rec, thread_index); + unique_id_rec_size[3] = sizeof(cur_unique_id.thread_index); unique_id_rec_total_size = unique_id_rec_size[0] + unique_id_rec_size[1] + unique_id_rec_size[2] + unique_id_rec_size[3] + unique_id_rec_size[4]; @@ -165,86 +148,13 @@ static int unique_id_global_init(apr_pool_t *p, apr_pool_t *plog, apr_pool_t *pt */ unique_id_rec_size_uu = (unique_id_rec_total_size*8+5)/6; - /* - * Now get the global in_addr. Note that it is not sufficient to use one - * of the addresses from the main_server, since those aren't as likely to - * be unique as the physical address of the machine - */ - if ((rv = apr_gethostname(str, sizeof(str) - 1, p)) != APR_SUCCESS) { - ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01563) - "unable to find hostname of the server"); - return HTTP_INTERNAL_SERVER_ERROR; - } - - if ((rv = apr_sockaddr_info_get(&sockaddr, str, AF_INET, 0, 0, p)) == APR_SUCCESS) { - global_in_addr = sockaddr->sa.sin.sin_addr.s_addr; - } - else { - ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01564) - "unable to find IPv4 address of \"%s\"", str); -#if APR_HAVE_IPV6 - if ((rv = apr_sockaddr_info_get(&sockaddr, str, AF_INET6, 0, 0, p)) == APR_SUCCESS) { - memcpy(&global_in_addr, - (char *)sockaddr->ipaddr_ptr + sockaddr->ipaddr_len - sizeof(global_in_addr), - sizeof(global_in_addr)); - ap_log_error(APLOG_MARK, APLOG_ALERT, rv, main_server, APLOGNO(01565) - "using low-order bits of IPv6 address " - "as if they were unique"); - } - else -#endif - return HTTP_INTERNAL_SERVER_ERROR; - } - - apr_sockaddr_ip_get(&ipaddrstr, sockaddr); - ap_log_error(APLOG_MARK, APLOG_INFO, 0, main_server, APLOGNO(01566) "using ip addr %s", - ipaddrstr); - - /* - * If the server is pummelled with restart requests we could possibly end - * up in a situation where we're starting again during the same second - * that has been used in previous identifiers. Avoid that situation. - * - * In truth, for this to actually happen not only would it have to restart - * in the same second, but it would have to somehow get the same pids as - * one of the other servers that was running in that second. Which would - * mean a 64k wraparound on pids ... not very likely at all. - * - * But protecting against it is relatively cheap. We just sleep into the - * next second. - */ - apr_sleep(apr_time_from_sec(1) - apr_time_usec(apr_time_now())); return OK; } static void unique_id_child_init(apr_pool_t *p, server_rec *s) { - pid_t pid; - - /* - * Note that we use the pid because it's possible that on the same - * physical machine there are multiple servers (i.e. using Listen). But - * it's guaranteed that none of them will share the same pids between - * children. - * - * XXX: for multithread this needs to use a pid/tid combo and probably - * needs to be expanded to 32 bits - */ - pid = getpid(); - cur_unique_id.pid = pid; - - /* - * Test our assumption that the pid is 32-bits. It's possible that - * 64-bit machines will declare pid_t to be 64 bits but only use 32 - * of them. It would have been really nice to test this during - * global_init ... but oh well. - */ - if ((pid_t)cur_unique_id.pid != pid) { - ap_log_error(APLOG_MARK, APLOG_CRIT, 0, s, APLOGNO(01567) - "oh no! pids are greater than 32-bits! I'm broken!"); - } - - cur_unique_id.in_addr = global_in_addr; + ap_random_insecure_bytes(&cur_unique_id.root, + sizeof(cur_unique_id.root)); /* * If we use 0 as the initial counter we have a little less protection @@ -253,13 +163,6 @@ static void unique_id_child_init(apr_pool_t *p, server_rec *s) */ ap_random_insecure_bytes(&cur_unique_id.counter, sizeof(cur_unique_id.counter)); - - /* - * We must always use network ordering for these bytes, so that - * identifiers are comparable between machines of different byte - * orderings. Note in_addr is already in network order. - */ - cur_unique_id.pid = htonl(cur_unique_id.pid); } /* NOTE: This is *NOT* the same encoding used by base64encode ... the last two @@ -291,10 +194,8 @@ static const char *gen_unique_id(const request_rec *r) unsigned short counter; int i,j,k; - new_unique_id.in_addr = cur_unique_id.in_addr; - new_unique_id.pid = cur_unique_id.pid; + memcpy(&new_unique_id.root, &cur_unique_id.root, ROOT_SIZE); new_unique_id.counter = cur_unique_id.counter; - new_unique_id.stamp = htonl((unsigned int)apr_time_sec(r->request_time)); new_unique_id.thread_index = htonl((unsigned int)r->connection->id); --------------070507070701020809040904--