From users-return-265591-archive-asf-public=cust-asf.ponee.io@tomcat.apache.org  Fri Sep 21 17:34:58 2018
Return-Path: <users-return-265591-archive-asf-public=cust-asf.ponee.io@tomcat.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 289F8180656
	for <archive-asf-public@cust-asf.ponee.io>; Fri, 21 Sep 2018 17:34:57 +0200 (CEST)
Received: (qmail 14229 invoked by uid 500); 21 Sep 2018 15:34:56 -0000
Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:users-help@tomcat.apache.org>
List-Unsubscribe: <mailto:users-unsubscribe@tomcat.apache.org>
List-Post: <mailto:users@tomcat.apache.org>
List-Id: <users.tomcat.apache.org>
Reply-To: "Tomcat Users List" <users@tomcat.apache.org>
Delivered-To: mailing list users@tomcat.apache.org
Received: (qmail 14218 invoked by uid 99); 21 Sep 2018 15:34:56 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Sep 2018 15:34:56 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id DE4DB1A034D
	for <users@tomcat.apache.org>; Fri, 21 Sep 2018 15:34:55 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -0.001
X-Spam-Level:
X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31
	tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id Iq1pS6Ews8O9 for <users@tomcat.apache.org>;
	Fri, 21 Sep 2018 15:34:54 +0000 (UTC)
Received: from mailserver.kippdata.de (mailserver.kippdata.de [212.79.170.253])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 92C545F23D
	for <users@tomcat.apache.org>; Fri, 21 Sep 2018 15:34:54 +0000 (UTC)
Received: from [10.0.110.6] ([192.168.2.104])
	by mailserver.kippdata.de (8.13.5/8.13.5) with ESMTP id w8LFYq2d014623;
	Fri, 21 Sep 2018 17:34:52 +0200 (CEST)
Subject: Re: jk_handler::mod_jk.c (2917): Could not get endpoint for worker
 ...
To: Tomcat Users List <users@tomcat.apache.org>,
        Clemens Wyss DEV <clemensdev@mysign.ch>
References: <2ecd8954cf1541548a95d810338618c8@Exchange2013.mysigndomain.corp>
From: Rainer Jung <rainer.jung@kippdata.de>
Message-ID: <18f32251-a42f-02d7-427a-32caa52e6720@kippdata.de>
Date: Fri, 21 Sep 2018 17:34:47 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.0
MIME-Version: 1.0
In-Reply-To: <2ecd8954cf1541548a95d810338618c8@Exchange2013.mysigndomain.corp>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit

Am 15.09.2018 um 12:50 schrieb Clemens Wyss DEV:
> Hi all,
> we are seeing quite a few:
> "[Mon Sep 10 15:19:46 2018] [27562:140532026529536] [error] jk_handler::mod_jk.c (2917): Could not get endpoint for worker=testAPJ"
> 
> errors in our md_jk.log. Worker properties are as follwos:
> 
> ...
> worker.list=testAPJ
> 
> worker.testAPJ.port=8009
> worker.testAPJ.host=127.0.0.1
> worker.testAPJ.type=ajp13
> worker.testAPJ.socket_keepalive=1
> worker.testAJP.connection_pool_timeout=600
> ...
> 
> At that point Apache seems to be stuck/struggling (but our tomcat does not seem to be under pressure). Restarting Apache solves the issue ... till it pops up again ...
> 
> What is happening? What needs tob e tuned?
> 
> Apache 2.4.34, tried both event- and worker-MPM

Assuming this is mod_jk 1.2.44? Are there more setting for worker testAPJ?

Normally mod_jk creates as many local connection structures (named 
endpoints) in each Apache httpd child process, as that process has 
worker threads. When an httpd worker thread wants to talk to tomcat, it 
retrieves such an endpoint and uses it to create and handle the 
commnunication.

The error you observe means, that all endpoints were already in use. 
Since we create as many structures as there are worker threads - 
everything is per httpd process, this should not happen (and I don't 
remember any case were it did happen).

Ideas what could go wrong:

- setting the worker property connection_pool_size or the deprecated 
cachesize for worker testAPJ to a smaller value than your httpd 
ThreadsPerChild (32 from your config snippet). If not set, mod_jk 
automatically detects the number of httpd worker threads

- setting connection_acquire_timeout to a small value. By default it is 
equals to retries*retry_interval which in turn by default is equals to 
2*100 milliseconds. mod_jk will retry getting an endpoint before it 
shows you error message "retries" times with a sleep pause of 
"retry_interval" milliseconds but no longer than 
connection_acquire_timeout milliseconds.

- retrieving and endpoint must acquire a lock first. On some platforms 
locking can lead to problems like false positives in deadlock detection. 
But i think this can't happen here since the code doesn't check the 
return value of the locking.

- memory shortage leading to failing allocations (not likely but possible)

Do you see any other log messages? Any ones in the httpd error log or 
especially the mod_jk log? There should be a WARN message of type 
"Unable to get the free endpoint for worker %s from %u slots" but maybe 
more before that final problem happens? What do you see with JkLogLevel 
info?

Does the problem happen under high load or when your backend gets slow? 
What does "netstat -anp | grep 8009" show when the hang occurs?

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org