commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Consultant Leon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DAEMON-366) Windows service fails to stop with error code 1053 (using Windows 7 service manager)
Date Sat, 24 Jun 2017 19:13:02 GMT

    [ https://issues.apache.org/jira/browse/DAEMON-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062098#comment-16062098
] 

Consultant Leon commented on DAEMON-366:
----------------------------------------

Firstly I've now managed to resolve the shutdown problem by fixing our application and ensure
it properly terminates all threads - this is confirmed as one solution.

Secondly I'm not sure what the best behaviour should be for procrun when a process does not
behave so I created the debug log to help you to determine if this is really a 1.0.15 procrun
issue as well. I do not see the 'waiting 1 minute message'.

So I reinstalled the service with option --LogLevel=Debug

Then perform sc stop TestC (TestC is the name of my service at this point)

Once I shutdown I see our own application log:
2017-06-24_20:43:07.711- INFO-[FileExportDestination-HelloWorld.OutputDest]-rename/move temporary
file ...
2017-06-24_20:43:07.711- INFO-[FileExportDestination-HelloWorld.OutputDest]-consume done [FileSource_15cdb6a2ca3_1],
status=PROCESSING 
2017-06-24_20:43:54.466- INFO-[Thread-3                                ]-Stopping APP service
'TestC'... 
2017-06-24_20:43:54.466- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
pause all sources configured with pause.before.shutdown=true, request remaining sources to
stop 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
all sources set to pause 'before shutdown' have now paused, other sources have stopped 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
wait max 30 seconds for queues to flush 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
all queues are flushed 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
pause all components configured with pause.on.shutdown=true... 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-HelloWorld-SYSTEM:
all components set to pause 'on shutdown' have now paused 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-ServiceResponder:
close service connection 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-all services stopped
timely 
2017-06-24_20:43:54.482- INFO-[Thread-3                                ]-APP service 'TestC'
stopped. 

So at 20:43:54.482 the main java application shutdown is complete.

I then run jstack -F <pid> and see many threads left running indeed:
$ sc queryex TestC

SERVICE_NAME: TestC
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 3  STOP_PENDING
                                (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x3
        WAIT_HINT          : 0x493e0
        PID                : 11088
        FLAGS              :

> D:\app_home\app-2.0.0.0.0-5145\etc
$ jstack -F 11088

Then now a good 20 minutes later the service is still in STATE STOP_PENDING:

The only other logs are from the proc daemon:
commons-daemon.2017-06-24.log
[2017-06-24 20:40:38] [debug] ( prunsrv.c:1679) [10484] Commons Daemon procrun log initialized
[2017-06-24 20:40:38] [info]  ( prunsrv.c:1683) [10484] Commons Daemon procrun (1.0.15.0 64-bit)
started
[2017-06-24 20:40:38] [debug] ( prunsrv.c:561 ) [10484] Installing service...
[2017-06-24 20:40:38] [info]  ( prunsrv.c:600 ) [10484] Service TestC name TestC
[2017-06-24 20:40:38] [debug] ( prunsrv.c:616 ) [10484] Setting service description FusionFabric
Connect (D:\APP_home\APP-2.0.0.0.0-5145\config\samples\HelloWorld)
[2017-06-24 20:40:38] [info]  ( prunsrv.c:634 ) [10484] Service 'TestC' installed
[2017-06-24 20:40:38] [info]  ( prunsrv.c:1764) [10484] Commons Daemon procrun finished
[2017-06-24 20:40:55] [debug] ( prunsrv.c:1679) [ 7180] Commons Daemon procrun log initialized
[2017-06-24 20:40:55] [info]  ( prunsrv.c:1683) [ 7180] Commons Daemon procrun (1.0.15.0 64-bit)
started
[2017-06-24 20:40:55] [info]  ( prunsrv.c:1596) [ 7180] Running 'TestC' Service...
[2017-06-24 20:40:55] [debug] ( prunsrv.c:1374) [ 8884] Inside ServiceMain...
[2017-06-24 20:40:55] [debug] ( prunsrv.c:844 ) [ 8884] reportServiceStatusE: 2, 0, 3000,
0
[2017-06-24 20:40:55] [info]  ( prunsrv.c:1127) [ 8884] Starting service...
[2017-06-24 20:40:55] [debug] ( javajni.c:233 ) [ 8884] loading jvm 'C:\Program Files\Java\jdk1.8.0_73\jre\bin\server\jvm.dll'
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[0] -XX:+UseConcMarkSweepGC
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[1] -XX:MaxGCPauseMillis=2000
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[2] -XX:GCTimeRatio=10
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[3] -XX:+CMSIncrementalMode
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[4] -XX:+CMSIncrementalPacing
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[5] -XX:+DisableExplicitGC
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[6] -DAPP.LOG_DIR=D:\APP_home\APP-2.0.0.0.0-5145\config\samples\HelloWorld\logs
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[7] -DAPP.DATA_DIR=D:\APP_home\APP-2.0.0.0.0-5145\config\samples\HelloWorld\data
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[8] -Djava.class.path=.;D:\APP_home\APP-2.0.0.0.0-5145\bin\APP-api-2.0.0-00_00-RELEASE.jar;...
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[9] exit
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[10] -Xms512m
[2017-06-24 20:40:55] [debug] ( javajni.c:704 ) [ 2824] Jvm Option[11] -Xmx8192m
[2017-06-24 20:40:55] [debug] ( javajni.c:888 ) [ 2824] argv[0] = TestC
[2017-06-24 20:40:55] [debug] ( javajni.c:888 ) [ 2824] argv[1] = D:\APP_home\APP-2.0.0.0.0-5145\config\samples\HelloWorld\logs
[2017-06-24 20:40:55] [debug] ( javajni.c:888 ) [ 2824] argv[2] = director.properties
[2017-06-24 20:40:55] [debug] ( javajni.c:941 ) [ 2824] Java Worker thread started com/COMPANY/tools/integration/director/WindowsServiceWrapper:start
[2017-06-24 20:40:56] [debug] ( prunsrv.c:1186) [ 8884] Java started com/COMPANY/tools/integration/director/WindowsServiceWrapper
[2017-06-24 20:40:56] [info]  ( prunsrv.c:1284) [ 8884] Service started in 1453 ms.
[2017-06-24 20:40:56] [debug] ( prunsrv.c:844 ) [ 8884] reportServiceStatusE: 4, 0, 0, 0
[2017-06-24 20:40:56] [debug] ( prunsrv.c:1528) [ 8884] Waiting for worker to finish...
[2017-06-24 20:40:58] [debug] ( javajni.c:964 ) [ 2824] Java Worker thread finished com/COMPANY/tools/integration/director/WindowsServiceWrapper:start
with status=0
[2017-06-24 20:40:58] [debug] ( prunsrv.c:1533) [ 8884] Worker finished.
[2017-06-24 20:40:58] [debug] ( prunsrv.c:1559) [ 8884] Waiting for all threads to exit
[2017-06-24 20:43:54] [debug] ( prunsrv.c:844 ) [ 7180] reportServiceStatusE: 3, 0, 3000,
0
[2017-06-24 20:43:54] [info]  ( prunsrv.c:943 ) [10688] Stopping service...
[2017-06-24 20:43:54] [debug] ( javajni.c:888 ) [ 9936] argv[0] = TestC
[2017-06-24 20:43:54] [debug] ( javajni.c:941 ) [ 9936] Java Worker thread started com/COMPANY/tools/integration/director/WindowsServiceWrapper:stop
[2017-06-24 20:43:54] [debug] ( javajni.c:964 ) [ 9936] Java Worker thread finished com/COMPANY/tools/integration/director/WindowsServiceWrapper:stop
with status=0
[2017-06-24 20:43:55] [debug] ( prunsrv.c:990 ) [10688] Waiting for java jni stop worker to
finish...
[2017-06-24 20:43:55] [debug] ( prunsrv.c:992 ) [10688] Java jni stop worker finished.
[2017-06-24 20:43:55] [debug] ( prunsrv.c:844 ) [10688] reportServiceStatusE: 3, 0, 300000,
0
[2017-06-24 20:43:55] [debug] ( prunsrv.c:1093) [10688] Waiting for worker to die naturally...
[2017-06-24 20:43:55] [debug] ( prunsrv.c:1104) [10688] Worker finished gracefully in 0 ms.
[2017-06-24 20:43:55] [info]  ( prunsrv.c:1114) [10688] Service stop thread completed.

So you see that prunsrv.c reports stop thread completed while the process is still running
with pid 11088

!screenshot-1.png!

> Windows service fails to stop with error code 1053 (using Windows 7 service manager)
> ------------------------------------------------------------------------------------
>
>                 Key: DAEMON-366
>                 URL: https://issues.apache.org/jira/browse/DAEMON-366
>             Project: Commons Daemon
>          Issue Type: Bug
>          Components: Procrun
>    Affects Versions: 1.0.15
>         Environment: Windows 7 64bit - using 64-bit prunsrv.exe in jvm mode
> Using 64-bit Java JRE 1.8.0_73
>            Reporter: Consultant Leon
>            Assignee: Guillaume Chauvet
>              Labels: prunsrv, stop, windows
>         Attachments: screenshot-1.png
>
>
> Using Windows services manager (services.msc) to start/stop service.
> Service installed with the following start/stop settings.
> --StartMode=jvm --StartClass=com.myorg.MyClass --StartMethod=start 
> --StopMode=jvm --StopClass=com.myorg.MyClass --StopMethod=stop 
> Service starts OK.
> When I use services manager to stop the service, the application stops running but the
prunsrv.exe process does not terminate.
> Eventually the services manager reports that:
> "Windows could not stop the <MyServiceName> service on Local Computer.  Error 1053:
The service did not respond to the start or control request in a timely fashion."
> The service then stays at status "stopping" & cannot be restarted without first killing
the prunsrv.exe process.
> Looking at the application logs I can see that the application's stop() method gets called
- the final line in this method is a logging call which is executing OK.
> The commons-daemon-<date>.log file has the following entries:
> [2013-05-09 17:18:02] [info]  [10944] Commons Daemon procrun (1.0.15.0 32-bit) started
> [2013-05-09 17:18:02] [info]  [10944] Running '<MyServiceName>' Service...
> [2013-05-09 17:18:02] [info]  [ 9224] Starting service...
> [2013-05-09 17:18:11] [info]  [ 9224] Service started in 9202 ms.
> [2013-05-09 17:18:23] [info]  [10780] Stopping service...
> [2013-05-09 17:18:24] [info]  [10780] Service stop thread completed.
> This problem does not occur in daemon versions 11->14.
> I have only seen it in daemon-1.0.15



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message