Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 53C40200B26 for ; Mon, 27 Jun 2016 15:11:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 523D7160A5B; Mon, 27 Jun 2016 13:11:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9B6CB160A3C for ; Mon, 27 Jun 2016 15:11:25 +0200 (CEST) Received: (qmail 8069 invoked by uid 500); 27 Jun 2016 13:11:24 -0000 Mailing-List: contact dev-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Developers List" Delivered-To: mailing list dev@tomcat.apache.org Received: (qmail 8059 invoked by uid 99); 27 Jun 2016 13:11:24 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jun 2016 13:11:24 +0000 Received: from [192.168.23.9] (host86-152-79-120.range86-152.btcentralplus.com [86.152.79.120]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 059611A015B for ; Mon, 27 Jun 2016 13:11:23 +0000 (UTC) Subject: Re: Bug that spans tomcat and tomcat-native Reply-To: Tomcat Developers List References: <7afb790c-67dc-7dd5-56a0-78961479f571@apache.org> <68f53a04-aa52-ce2f-43d7-9a9188b43116@apache.org> <8c8ee067-bdda-b6e1-e189-19e36791ddbb@apache.org> <4b3edb6e-5649-0d82-cbbe-306fd7e5f8bf@apache.org> <4ec22aa8-a9f4-2f03-ce04-023f3bfd6548@apache.org> <7db3621c-15de-6d67-95ce-f806af802569@apache.org> To: Tomcat Developers List From: Mark Thomas Message-ID: Date: Mon, 27 Jun 2016 14:11:17 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: <7db3621c-15de-6d67-95ce-f806af802569@apache.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit archived-at: Mon, 27 Jun 2016 13:11:26 -0000 I believe I have an explanation for what is going on that fits both the reported behaviour and the proposed fix. Background ========== OpenSSL tracks a list of the most recent errors for each thread in a hash map keyed on the thread (int_thread_hash in err.c). Reading and writing to this hash map is protected by a lock. The hash map is created and populated lazily. tc-native calls ERR_clear_error() before every call to SSL_do_handshake(), SSL_read() and SSL_write(). The call to ERR_clear_error() either clears the error list for the current thread or inserts a new empty list into the hash map of the thread is not already present. The performance problem was tracked down to threads waiting in ERR_clear_error() to obtain the write lock for the hash map. The proposed solution was to call ERR_remove_thread_state() just before the current Tomcat thread processing the connection is returned to the thread pool. This method removes the current thread and its associated error list from the hash map. Analysis ======== The proposed solution, calling ERR_remove_thread_state(), adds a call that also obtains the write lock for the hash map. This indicates that the problem is not delays in obtaining the lock but contention for the lock because one or more operations taking place within the lock are taking a long time. Removing unused threads from the hash map removes the bottleneck. This points towards the hash map being the source of the problem. Testing by the OP showed that as soon as a test had been ran that required ~ 400 concurrent threads performance dropped significantly. It did not get noticeably worse if the same 400 thread test was run repeatedly. My testing indicated, on OSX at least, that the thread IDs used in the hash map were stable and that uncontrolled growth of the hash map was unlikely to be the cause. The manner in which thread IDs are generated varies by platform. On Linux, where this problem was observed, the thread ID is derived from (is normally equal to) the memory address of the per thread errno variable. This means that thread IDs tend to be concentrated in a relatively narrow range of values. For example, in a simple 10 thread test on OSX thread IDs ranged from 123145344839680 to 123145354387455. Thread IDs therefore fall with a 10^7 range within a possible range of 1.8x10^19. i.e. a very small, contiguous range. Hash maps use hashing functions to ensure that entries are (roughly) evenly distributed between the available buckets. The hash function, err_state_hash, used for the thread IDs in OpenSSL is threadID * 13. Supposition =========== The hash function used (multiple by 13) is insufficient to distribute the resulting values across multiple buckets because they will still fall in a relatively narrow band. Therefore all the threads end up in a single bucket which makes the performance of the hash map poor. This in turn makes calls to thread_get_item() slow because it does a hash map lookup. This lookup is performed with the read lock held for the hash map which in turn will slow down the calls that require the write lock. Proposal ======== The analysis and supposition above need to be checked by someone with a better understanding of C than me. Assuming my work is correct, the next step is to look at possible fixes. I do not believe that patching OpenSSL is a viable option. The OpenSSL API needs to be reviewed to see if there is a way to avoid the calls that require the write lock. If the write lock cannot be avoided then we need to see if there is a better place to call ERR_remove_thread_state(). I'd like to fix this entirely in tc-native but that may mean calling ERR_remove_thread_state() more frequently which could create its own performance problems. Nate - I may have some patches for you to test in the next few days. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org For additional commands, e-mail: dev-help@tomcat.apache.org