tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Jung <rainer.j...@kippdata.de>
Subject Re: Bug that spans tomcat and tomcat-native
Date Mon, 27 Jun 2016 15:54:58 GMT
Hi Mark,

Am 27.06.2016 um 15:11 schrieb Mark Thomas:
> I believe I have an explanation for what is going on that fits both the
> reported behaviour and the proposed fix.
>
> Background
> ==========
>
> OpenSSL tracks a list of the most recent errors for each thread in a
> hash map keyed on the thread (int_thread_hash in err.c). Reading and
> writing to this hash map is protected by a lock. The hash map is created
> and populated lazily.
>
> tc-native calls ERR_clear_error() before every call to
> SSL_do_handshake(), SSL_read() and SSL_write(). The call to
> ERR_clear_error() either clears the error list for the current thread or
> inserts a new empty list into the hash map of the thread is not already
> present.
>
> The performance problem was tracked down to threads waiting in
> ERR_clear_error() to obtain the write lock for the hash map.
>
> The proposed solution was to call ERR_remove_thread_state() just before
> the current Tomcat thread processing the connection is returned to the
> thread pool. This method removes the current thread and its associated
> error list from the hash map.
>
>
> Analysis
> ========
>
> The proposed solution, calling ERR_remove_thread_state(), adds a call
> that also obtains the write lock for the hash map. This indicates that
> the problem is not delays in obtaining the lock but contention for the
> lock because one or more operations taking place within the lock are
> taking a long time.
>
> Removing unused threads from the hash map removes the bottleneck. This
> points towards the hash map being the source of the problem.
>
> Testing by the OP showed that as soon as a test had been ran that
> required ~ 400 concurrent threads performance dropped significantly. It
> did not get noticeably worse if the same 400 thread test was run repeatedly.
>
> My testing indicated, on OSX at least, that the thread IDs used in the
> hash map were stable and that uncontrolled growth of the hash map was
> unlikely to be the cause.
>
> The manner in which thread IDs are generated varies by platform. On
> Linux, where this problem was observed, the thread ID is derived from
> (is normally equal to) the memory address of the per thread errno
> variable. This means that thread IDs tend to be concentrated in a
> relatively narrow range of values. For example, in a simple 10 thread
> test on OSX thread IDs ranged from 123145344839680 to 123145354387455.
>
> Thread IDs therefore fall with a 10^7 range within a possible range of
> 1.8x10^19. i.e. a very small, contiguous range.
>
> Hash maps use hashing functions to ensure that entries are (roughly)
> evenly distributed between the available buckets. The hash function,
> err_state_hash, used for the thread IDs in OpenSSL is threadID * 13.
>
> Supposition
> ===========
>
> The hash function used (multiple by 13) is insufficient to distribute
> the resulting values across multiple buckets because they will still
> fall in a relatively narrow band. Therefore all the threads end up in a
> single bucket which makes the performance of the hash map poor. This in
> turn makes calls to thread_get_item() slow because it does a hash map
> lookup. This lookup is performed with the read lock held for the hash
> map which in turn will slow down the calls that require the write lock.
>
> Proposal
> ========
>
> The analysis and supposition above need to be checked by someone with a
> better understanding of C than me. Assuming my work is correct, the next
> step is to look at possible fixes. I do not believe that patching
> OpenSSL is a viable option.
>
> The OpenSSL API needs to be reviewed to see if there is a way to avoid
> the calls that require the write lock.
>
> If the write lock cannot be avoided then we need to see if there is a
> better place to call ERR_remove_thread_state(). I'd like to fix this
> entirely in tc-native but that may mean calling
> ERR_remove_thread_state() more frequently which could create its own
> performance problems.
>
> Nate - I may have some patches for you to test in the next few days.
>
> Mark

Great analysis. I was really wondering, what could make the hash map so 
huge and hadn't thought about the hash function as the problem.

Before OpenSSL 1.1.0 there's a callback for applications to provide 
their own thread IDs:

https://www.openssl.org/docs/man1.0.2/crypto/CRYPTO_THREADID_set_callback.html

So we could probably work around the problem of the poor hashing 
function by passing in IDs that work for hashing (pre-hashed ID?). Of 
course then we loose the direct association of the OpenSSL thread ID 
with the real platform thread id.

Currently our callback in tcnative is ssl_set_thread_id() which refers 
to ssl_thread_id(), which on Linux gets the ID from the APR function 
apr_os_thread_current(). So we could add some hashing formula in 
ssl_thread_id().

In OpenSSL 1.1.0 this thread ID callback is gone. We'll see whether the 
lock hash problem is gone too ...

I think we could add some logging to the hash impl to verify the 
assumption, that many thread IDs result in the same hash value. The 
lhash impl already seems to maintain quite a few statistics:

https://www.openssl.org/docs/man1.0.2/crypto/lh_stats.html

The source code shows the following counters in OpenSSL lhash.c:

     ret->num_nodes = MIN_NODES / 2;
     ret->num_alloc_nodes = MIN_NODES;
     ret->num_items = 0;
     ret->num_expands = 0;
     ret->num_expand_reallocs = 0;
     ret->num_contracts = 0;
     ret->num_contract_reallocs = 0;
     ret->num_hash_calls = 0;
     ret->num_comp_calls = 0;
     ret->num_insert = 0;
     ret->num_replace = 0;
     ret->num_delete = 0;
     ret->num_no_delete = 0;
     ret->num_retrieve = 0;
     ret->num_retrieve_miss = 0;
     ret->num_hash_comps = 0;

Something like adding a global atomic counter to a special OpenSSL build 
that counts calls to ERR_clear_err() and whenever it is a multiple of 
100 (or some other value) dump the statistics.

Alternatively one could use the debugger and try finding the error state 
lhash and its structure members.

WDYT?

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message