storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (STORM-388) make supervisor more resilient to missing .ser files
Date Thu, 03 Jul 2014 22:14:34 GMT
Radim Kolar created STORM-388:
---------------------------------

             Summary: make supervisor more resilient to missing .ser files
                 Key: STORM-388
                 URL: https://issues.apache.org/jira/browse/STORM-388
             Project: Apache Storm (Incubating)
          Issue Type: Bug
    Affects Versions: 0.9.2-incubating
            Reporter: Radim Kolar


Currently supervisor process can not run without some kind of supervisor software like systemd.
It exits too often on missing .ser file error with [INFO] Halting process

examples:

a)

2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down and clearing state for
 id efd37b78-eb69-46a1-b317-9b5b4ba00584. Current supervisor time: 1404412373. S
tate: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time
-secs 1404412311, :storm-id "Storm-throughput-test-7-1404411531", :executors #{[
2 2] [4 4] [6 6] [-1 -1]}, :port 6702}
2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down 55f2b426-c170-4e48-a76
8-2a82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
2014-07-03 20:32:54 b.s.d.supervisor [INFO] Removing code for storm id Storm-thr
oughput-test-7-1404411531
2014-07-03 20:32:55 b.s.d.supervisor [INFO] Shut down 55f2b426-c170-4e48-a768-2a
82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
2014-07-03 20:32:55 b.s.d.supervisor [INFO] Launching worker with assignment #ba
cktype.storm.daemon.supervisor.LocalAssignment{:storm-id "Storm-throughput-test-
7-1404411531", :executors ([6 6] [4 4] [2 2])} for this supervisor 55f2b426-c170
-4e48-a768-2a82c0f383ce on port 6702 with id 6518a348-1fea-4401-8b7b-365b4ac3627
9
2014-07-03 20:32:55 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-thro
ughput-test-7-1404411531/stormconf.ser' does not exist

b)

2014-07-03 20:32:43 o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181,
initiating session
2014-07-03 20:32:51 o.a.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service, session
0x146fb27b8400027 has expired, closing socket connection
2014-07-03 20:32:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST
8d-1069-44e3-b3ca-c25390cbf719
2014-07-03 10:29:22 b.s.d.supervisor [INFO] Removing code for storm id Storm-throughput-test-1-140433
5149
2014-07-03 10:29:22 b.s.d.supervisor [INFO] Shut down 167cf900-2ec6-499b-9c09-12c1e48dbc08:f776588d-1
069-44e3-b3ca-c25390cbf719
2014-07-03 10:29:22 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.s
upervisor.LocalAssignment{:storm-id "Storm-throughput-test-1-1404335149", :executors ([3 3]
[5 5] [4 
4] [2 2] [1 1])} for this supervisor 167cf900-2ec6-499b-9c09-12c1e48dbc08 on port 6702 with
id 1dd28a
8e-53cd-4af3-a4ae-7ebae0b9427f
2014-07-03 10:29:22 b.s.event [ERROR] Error when processing event
java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-throughput-test-1-1404335
149/stormconf.ser' does not exist

in both cases there were problems with zookeeper connection event failure before missing .ser
file error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message