nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: PutTCP connector not cleaning up dangling connections
Date Mon, 18 Sep 2017 13:22:40 GMT
Good catch.  Can you please be sure to cover in a JIRA?

That said, wouldn't we see that in the stack trace during the
problematic condition?

On Mon, Sep 18, 2017 at 9:16 AM, Bryan Bende <bbende@gmail.com> wrote:
> The code in SocketChannelSender that Davy pointed out could definitely
> be the problem...
>
> It makes a non-blocking channel and calls connect, then goes into a
> loop waiting for finishConnect() to return true, but if that doesn't
> happen before the configured timeout, then it throws an exception, but
> it doesn't first close the channel, and the processor also doesn't
> close the sender.
>
> As an example comparison, this code which is being used by PutTCP:
>
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-processor-utils/src/main/java/org/apache/nifi/processor/util/put/sender/SocketChannelSender.java#L45-L78
>
> Should be doing something like this where it catches, closes, and rethrows:
>
> https://github.com/apache/nifi/blob/master/nifi-commons/nifi-security-utils/src/main/java/org/apache/nifi/remote/io/socket/ssl/SSLSocketChannel.java#L140-L179
>
>
> On Mon, Sep 18, 2017 at 9:09 AM, Joe Witt <joe.witt@gmail.com> wrote:
>> Davy
>>
>> Interesting.  So in looking through the stack trace I don't see
>> anything related to sockets nifi has initiated to another service and
>> nothing for PutTCP.  I'm not saying that means there is nothing but
>> the stack traces only show the custom GetTCP processors.
>>
>> You can use netstat to show open sockets from the nifi process.  Can
>> you try that and share those?
>>
>> Does the NiFi UI show the processor as having stuck threads?  I'm
>> guessing not since there is nothing in the stack traces.
>>
>> Thanks
>>
>> On Mon, Sep 18, 2017 at 1:54 AM, ddewaele <ddewaele@gmail.com> wrote:
>>> Stopping the processor doesn't cleanup the tcp connection. It remains
>>> ESTABLISHED.
>>>
>>> There are 2 ways of getting out of it (none of them are ideal).
>>>
>>> - Restarting Nifi
>>> - Restarting the Moxa serial ports
>>>
>>> I've dumped the output in the following gist :
>>> https://gist.github.com/ddewaele/83705003740674962c1e133fb617f68c
>>>
>>> The GetTCP processor you'll see in the thread dump also interacts with the
>>> moxa. It is a Netty based custom processor we created (because there was no
>>> GetTCP at the time). However, we log all interactions (including client
>>> ports) with this processor and all of them end up getting closed correctly.
>>>
>>> So the "hanging" connection originated from the built-in PutTCP processor.
>>>
>>>
>>> Joe Witt wrote
>>>> If you stop the processor manually does it clean them up?
>>>>
>>>> When the connections appear stuck can you please get a thread dump?
>>>>
>>>> bin/nifi.sh dump
>>>>
>>>> The results end up in bootstrap.log.
>>>>
>>>> Thanks
>>>> Joe
>>>>
>>>> On Sep 17, 2017 2:22 PM, "ddewaele" &lt;
>>>
>>>> ddewaele@
>>>
>>>> &gt; wrote:
>>>>
>>>>> We are using NiFi PutTCP processors to send messages to a number of Moxa
>>>>> onCell ip gateway devices.
>>>>>
>>>>> These Moxa devices are running on a cellular network with not always
the
>>>>> most ideal connection. The Moxa only allows for a maximum of 2
>>>>> simultaneous
>>>>> client connections.
>>>>>
>>>>> What we notice is that although we specify connection / read timeouts
on
>>>>> both PutTCP and the Moxa, that sometimes a connection get "stuck". (In
>>>>> the
>>>>> moxa network monitoring we see 2 client sockets coming from PutTCP in
the
>>>>> ESTABLISHED state that never go away).
>>>>>
>>>>> This doesn't always happen, but often enough for it to be considered
a
>>>>> problem, as it requires a restart of the moxa ports to clear the
>>>>> connections
>>>>> (manual step). It typically happens when PutTCP experiences a Timeout.
>>>>>
>>>>> On the PutTCP processors we have the following settings :
>>>>>
>>>>> - Idle Connection Expiration : 30 seconds  (we've set this higher due
to
>>>>> bad
>>>>> gprs connection)
>>>>> - Timeout : 10 seconds (this is only used as a timeout for establishing
>>>>> the
>>>>> connection)
>>>>>
>>>>> On the Moxas we have
>>>>>
>>>>> - TCP alive check time : 2min (this should force the Moxa to close the
>>>>> socket)
>>>>>
>>>>> Yet for some reason the connection remains established.
>>>>>
>>>>> Here's what I found out :
>>>>>
>>>>> On the moxa I noticed a connection (with client port 48440) that is in
>>>>> ESTABLISHED mode for 4+ hours. (blocking other connections). On the Moxa
>>>>> I
>>>>> can see when the connection was established :
>>>>>
>>>>> 2017/09/17 14:20:29 [OpMode] Port01 Connect 10.192.2.90:48440
>>>>>
>>>>> I can track that down in Nifi via the logs (unfortunately PutTCP doesn't
>>>>> log
>>>>> client ports, but from the timestamp  I'm sure it's this connection :
>>>>>
>>>>> 2017-09-17 14:20:10,837 DEBUG [Timer-Driven Process Thread-10]
>>>>> o.apache.nifi.processors.standard.PutTCP
>>>>> PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
>>>>> creating a new one...
>>>>> 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
>>>>> o.apache.nifi.processors.standard.PutTCP
>>>>> PutTCP[id=80231a39-1008-1159-a6fa-1f9e3751d608] No available connections,
>>>>> and unable to create a new one, transferring
>>>>> StandardFlowFileRecord[uuid=79f2a166-5211-4d2d-9275-03f0ce4d5b29,claim=
>>>>> StandardContentClaim
>>>>> [resourceClaim=StandardResourceClaim[id=1505641210025-1,
>>>>> container=default,
>>>>> section=1], offset=84519, length=9],offset=0,name=
>>>>> 23934743676390659,size=9]
>>>>> to failure: java.net.SocketTimeoutException: Timed out connecting to
>>>>> 10.32.133.40:4001
>>>>> 2017-09-17 14:20:20,860 ERROR [Timer-Driven Process Thread-10]
>>>>> o.apache.nifi.processors.standard.PutTCP
>>>>> java.net.SocketTimeoutException: Timed out connecting to
>>>>> 10.32.133.40:4001
>>>>>         at
>>>>> org.apache.nifi.processor.util.put.sender.SocketChannelSender.open(
>>>>> SocketChannelSender.java:66)
>>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.processor.util.put.AbstractPutEventProcessor.createSender(
>>>>> AbstractPutEventProcessor.java:312)
>>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.processors.standard.PutTCP.createSender(PutTCP.java:121)
>>>>> [nifi-standard-processors-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.processor.util.put.AbstractPutEventProcessor.
>>>>> acquireSender(AbstractPutEventProcessor.java:334)
>>>>> ~[nifi-processor-utils-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.processors.standard.PutTCP.onTrigger(PutTCP.java:176)
>>>>> [nifi-standard-processors-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.controller.StandardProcessorNode.onTrigger(
>>>>> StandardProcessorNode.java:1099)
>>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
>>>>> ContinuallyRunProcessorTask.java:136)
>>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(
>>>>> ContinuallyRunProcessorTask.java:47)
>>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(
>>>>> TimerDrivenSchedulingAgent.java:132)
>>>>> [nifi-framework-core-1.1.0.jar:1.1.0]
>>>>>         at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>> [na:1.8.0_111]
>>>>>         at java.util.concurrent.FutureTask.runAndReset(
>>>>> FutureTask.java:308)
>>>>> [na:1.8.0_111]
>>>>>         at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$
>>>>> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>> [na:1.8.0_111]
>>>>>         at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
>>>>> ScheduledThreadPoolExecutor.java:294)
>>>>> [na:1.8.0_111]
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>>> ThreadPoolExecutor.java:1142)
>>>>> [na:1.8.0_111]
>>>>>         at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>>> ThreadPoolExecutor.java:617)
>>>>> [na:1.8.0_111]
>>>>>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
>>>>>
>>>>>
>>>>> On an OS level I indeed see the TCP connection originated from Nifi :
>>>>>
>>>>> netstat -tn | grep 48440
>>>>>
>>>>> tcp      711      0 10.192.2.90:48440       10.32.133.40:4001
>>>>> ESTABLISHED
>>>>>
>>>>>
>>>>> lsof -i TCP:48440
>>>>>
>>>>> COMMAND  PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
>>>>> java    3424 root 1864u  IPv4 404675057      0t0  TCP
>>>>> NifiServer:48440->10.32.133.40:newoak (ESTABLISHED)
>>>>>
>>>>> ps -ef | grep 3424
>>>>>
>>>>> root      3424  3390  8 11:32 ?        00:44:33 java -classpath
>>>>> /opt/nifi-1.1.0/./conf.....
>>>>>
>>>>> Any ideas on how to debug this further ?
>>>>>
>>>>> I looked at the code in SocketChannelSender.java:66 and was wondering
it
>>>>> no
>>>>> additional cleanup is necessary when throwing the SocketTimeoutException.
>>>>> If
>>>>> the connection ends up getting established after the timeout, doesn't
>>>>> PutTCP
>>>>> need to clean that up ?
>>>>>
>>>>> Thx
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/
>>>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/

Mime
View raw message