storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (STORM-388) make supervisor more resilient to missing .ser files
Date Fri, 04 Jul 2014 10:40:33 GMT

    [ https://issues.apache.org/jira/browse/STORM-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052345#comment-14052345
] 

Radim Kolar commented on STORM-388:
-----------------------------------

2 problems are here 

Due to bad synchronizations .ser files are missing (STORM-130) and due to incorrect handling
of error state supervisor exits (STORM-388).

> make supervisor more resilient to missing .ser files
> ----------------------------------------------------
>
>                 Key: STORM-388
>                 URL: https://issues.apache.org/jira/browse/STORM-388
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Radim Kolar
>              Labels: supervisor
>
> Currently supervisor process can not run without some kind of supervisor software like
systemd. It exits too often on missing .ser file error with [INFO] Halting process
> examples:
> a)
> 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down and clearing state for
>  id efd37b78-eb69-46a1-b317-9b5b4ba00584. Current supervisor time: 1404412373. S
> tate: :timed-out, Heartbeat: #backtype.storm.daemon.common.WorkerHeartbeat{:time
> -secs 1404412311, :storm-id "Storm-throughput-test-7-1404411531", :executors #{[
> 2 2] [4 4] [6 6] [-1 -1]}, :port 6702}
> 2014-07-03 20:32:53 b.s.d.supervisor [INFO] Shutting down 55f2b426-c170-4e48-a76
> 8-2a82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
> 2014-07-03 20:32:54 b.s.d.supervisor [INFO] Removing code for storm id Storm-thr
> oughput-test-7-1404411531
> 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Shut down 55f2b426-c170-4e48-a768-2a
> 82c0f383ce:efd37b78-eb69-46a1-b317-9b5b4ba00584
> 2014-07-03 20:32:55 b.s.d.supervisor [INFO] Launching worker with assignment #ba
> cktype.storm.daemon.supervisor.LocalAssignment{:storm-id "Storm-throughput-test-
> 7-1404411531", :executors ([6 6] [4 4] [2 2])} for this supervisor 55f2b426-c170
> -4e48-a768-2a82c0f383ce on port 6702 with id 6518a348-1fea-4401-8b7b-365b4ac3627
> 9
> 2014-07-03 20:32:55 b.s.event [ERROR] Error when processing event
> java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-thro
> ughput-test-7-1404411531/stormconf.ser' does not exist
> b)
> 2014-07-03 20:32:43 o.a.z.ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181,
initiating session
> 2014-07-03 20:32:51 o.a.z.ClientCnxn [INFO] Unable to reconnect to ZooKeeper service,
session 0x146fb27b8400027 has expired, closing socket connection
> 2014-07-03 20:32:51 o.a.c.f.s.ConnectionStateManager [INFO] State change: LOST
> 8d-1069-44e3-b3ca-c25390cbf719
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Removing code for storm id Storm-throughput-test-1-140433
> 5149
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Shut down 167cf900-2ec6-499b-9c09-12c1e48dbc08:f776588d-1
> 069-44e3-b3ca-c25390cbf719
> 2014-07-03 10:29:22 b.s.d.supervisor [INFO] Launching worker with assignment #backtype.storm.daemon.s
> upervisor.LocalAssignment{:storm-id "Storm-throughput-test-1-1404335149", :executors
([3 3] [5 5] [4 
> 4] [2 2] [1 1])} for this supervisor 167cf900-2ec6-499b-9c09-12c1e48dbc08 on port 6702
with id 1dd28a
> 8e-53cd-4af3-a4ae-7ebae0b9427f
> 2014-07-03 10:29:22 b.s.event [ERROR] Error when processing event
> java.io.FileNotFoundException: File 'storm-local/supervisor/stormdist/Storm-throughput-test-1-1404335
> 149/stormconf.ser' does not exist
> in both cases there were problems with zookeeper connection event failure before missing
.ser file error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message