directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn McKinney <smckin...@apache.org>
Subject Re: LdapConnectionPool.getConnection doing extraneous search?
Date Tue, 23 Mar 2021 16:41:48 GMT


> On Mar 23, 2021, at 10:45 AM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
> 
> On 23/03/2021 14:24, Shawn McKinney wrote:
>>> On Mar 23, 2021, at 3:00 AM, Emmanuel Lécharny <elecharny@gmail.com> wrote:
>>> 
>>> 
>>> On 22/03/2021 19:41, Shawn McKinney wrote:
>>>>> On Mar 22, 2021, at 11:05 AM, Emmanuel Lécharny <elecharny@gmail.com>
wrote:
>>>>> 
>>>>> LDAP connections are quite stable. Again, this check is there to respect
the commons-pool API contract. If the connection is dead, then doing this check will let the
pool fetching another connection, which is good. OTOH, if you don't want to check connections
while fetching one, then you are on your own (ie, deal with the consequences of a bad connection).
>>>> Again, I must disagree.  Connections aren’t stable, subjected to env conditions
and we can never assume a connection is good.
>>> 
>>> You *can* assume it's good, and react exceptionally if it's not. That the fastest
way to deal with potential bad connections, if you want to avoid a check every time you pull
a connection from the pool. (but see later for another option)
>> There are 150 locations in fortress core where an ldap connection is being pulled
from the pool. No way I want to revisit all of that code.
>>> 
>>> Something in the API chain must do the job of testing and reconnect, every time
it’s pulled from the pool.
>>> 
>>> This is exactly what the testOnBorrow does ;-) But it's costly... (see later
for another option)
>>> 
>>>> Now, having said that, that’s exactly what I’m observing currently, with
the test on borrow flag turned off.
>>>> Let me explain the scenario:
>>>> 1. Start server
>>>> 2. Start client
>>>> This initializes a connection pool which creates and stores exactly 1 connection
(min 1, max1 )
>>>> 3. Set breakpoint in client on pool.getConnection method
>>>> 4. Restart the server.
>>>> 5. Client performs ldap op which triggers the breakpoint on getConnection
>>>> 6. Server at this point still hasn’t any connections with the client. 
The client ‘thinks’ it has connections in the pool, but these were broken on step 4.
>>>> 7. Step over the getConnection which then triggers the commons pool to execute:
>>>> ```                GenericObjectPool._factory.activateObject(latch.getPair().value)
>>>> ```
>>>> 8. A connection is made with the server, along with bind
>>>> ```
>>>> 6058e163 conn=1000 fd=14 ACCEPT from IP=127.0.0.1:35246 (IP=127.0.0.1:389)
>>>>      [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com"
method=128
>>>>      [exec] 6058e163 conn=1000 op=0 BIND dn="cn=manager,dc=example,dc=com"
mech=SIMPLE ssf=0
>>>>      [exec] 6058e163 conn=1000 op=0 RESULT tag=97 err=0 duration=1.667ms
>>>> ```
>>>> 9. Client continues with its ldap op successfully and is never the wiser
that the connections were all forcibly closed on server restart.
>>>> This is EXACTLY what I want to see all of which is done without the test
on borrow eliminating the extraneous search on every connection retrieval.
>>> 
>>> So that would mean we have some kind of 'retry' on connection operation: if it
fails, then let's assume the connection is broken, and redo it with a fresh connection.
>>> 
>> Yes, that’s what’s happening.
>> In the Commons pool lib, this is the block that gets executed:
>> ```             this._factory.activateObject(latch.getPair().value);
>> if (this._testOnBorrow &&     !this._factory.validateObject(latch.getPair().value))
{
>> throw new Exception("ValidateObject failed");
>> }
>> ```
>> In the first line above, activateObject, this code gets called, from our AbstractPoolableLdapConnectionFactory
class:
>> ```
>> public void activateObject(LdapConnection connection) throws LdapException {
>>   LOG.debug("Activating {}", connection);
>>   if (!connection.isConnected() ||
>>       !connection.isAuthenticated()) {
>>     LOG.debug("rebind due to connection dropped on {}", connection);
>>     this.connectionFactory.bindConnection(connection);
>> }
>> ```
>> The code performs a rebind which renews the connection.
>> All of this with testOnBorrow being false!
> 
> 
> The second bind is pretty brutal, as it will create a brand new connection.
> 
>> So, I’m still scratching my head figuring why we need this secondary level that
is wasting a round trip to the server.
> 
> Well oh well, not sure we need it at all, considering the piece of code you exhumated
;-)
> 

Yes, that’s pretty much where I’m at. I’ve changed fortress to turn it off by default.
 Also be good to think about changing the default behavior in teh API as well.

> 
>>> The problem is that the pool is passive: it does not react to any connection
event, like a connection closure. OTOH, when a connection is properly closed by the server
(ie an NoticeOfDisconnect - aka NoD - is generated by the server), the connection should process
it and die. Now we have an issue: the connection is just lying in the pool, not being used
by any client, so there is a missing step: removing the connection from the pool in this very
case. That can be something we can add to the current LDAP pool.
>>> 
>>> Note that if the server does not send a NoD, you are screwed. There is no way
to be sure that the connection is ok until you use it. OTOH, leveraging the setTestWhileIdle()
could be a solution to partially workaround the issue.
>> Here you’ve lost me.  I’m running the server in debug mode, inside a bash shell,
running in the foreground.
>> In my test I stop the server by ‘ctrl-c’ killing the shell.
>> The server is not reacting to this and sending a NoD.
> 
> Indeed, the worst case scenario. It's pretty much as if you simply pull out the ethernet
cable from the socket (oh, well, put your wifi box in a Faraday cage).
> 

Exactly!  A pretty good test.

> Yes, in this very case, there will be no NoD. TCP is a best effort protocol, the only
way to be sure that there is 'somebody' on the other side is to send some data to it and expect
some kind of response, in absence of which you would consider the connection dead in the water.
Add to that the existence of a timeout mechanism on the OS level (generally speaking set to
7200 s), which makes it critical to set up a keepalive system to ensure that the connection
never dies because of a timeout...
> 
> In the LDAP API, we could prevent - sort of - the connection from being tested on demand
(ie when you need it), by setting a check on idling (if the connection has idled for more
than a given period of time, then we would consider it's dead). It's not leverage in the API
atm.
> 

I’m actually thinking about turning off teh check on idle.  It’s not needed, right?


> 
> All that being said, with the current 'rebind' mechanism, you should be safe, I think.

Thanks Emmanuel.  This is where I was hoping we’d end up with the discussion.  

It’s a pretty big decision to make in a vacuum of a test env and wanted to get some more
eyeballs on it.

I’ll point it out specifically on the next release, so it’s not overlooked.

—
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: api-unsubscribe@directory.apache.org
For additional commands, e-mail: api-help@directory.apache.org


Mime
View raw message