Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 075B6105C8 for ; Fri, 14 Feb 2014 02:35:19 +0000 (UTC) Received: (qmail 96062 invoked by uid 500); 14 Feb 2014 02:35:17 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 95841 invoked by uid 500); 14 Feb 2014 02:35:16 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 95819 invoked by uid 99); 14 Feb 2014 02:35:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Feb 2014 02:35:16 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ptgoetz@gmail.com designates 209.85.216.47 as permitted sender) Received: from [209.85.216.47] (HELO mail-qa0-f47.google.com) (209.85.216.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Feb 2014 02:35:11 +0000 Received: by mail-qa0-f47.google.com with SMTP id j5so17224572qaq.20 for ; Thu, 13 Feb 2014 18:34:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=1H32YdKQbB5AG4vxszZR5ciTqycpnjr3FOzP56Zb+gc=; b=DOJW7Cx6JjvY2rKP6B8uF1Xp7fUixWe7DAmD/suQWqZMVZwpwJrVmm0DACRUAiwIOO m45lbCrhd8fMoAp8Nkl5quz/irk6FLu9vDJQN6tFRAmxqIpAkTigceNjlRr8ODBFfHfz eyYkBppYYJv71pNwysgdJAA7x0o1BGVo99h0PyjlquHvvGf+zrqJXhDMKGOQ4bafXOTv dpEvo5E/4njONsbNTXRMBlqdd9gH0ivm+aM2EoFAsV3XJJncZk3yhKekunNjW5+tJdSl XILrAisa2N3HLGMdigeb3D1OfjAirp+FkJakm99BECn0smWdJmkq3wn58ArkrpJbhVMO ayXQ== X-Received: by 10.224.165.133 with SMTP id i5mr8610023qay.75.1392345290848; Thu, 13 Feb 2014 18:34:50 -0800 (PST) Received: from thidwick.local (pool-173-59-54-41.phlapa.fios.verizon.net. [173.59.54.41]) by mx.google.com with ESMTPSA id 30sm5697510qgt.4.2014.02.13.18.34.48 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 13 Feb 2014 18:34:49 -0800 (PST) Content-Type: multipart/signed; boundary="Apple-Mail=_6143E698-02FF-4F0D-BC11-5B83C12C73EE"; protocol="application/pgp-signature"; micalg=pgp-sha512 Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: Storm 0.9.0.1 - topology halts processing after a while From: "P. Taylor Goetz" In-Reply-To: <20140213161036.GA49676@animetrics.com> Date: Thu, 13 Feb 2014 21:34:46 -0500 Cc: dev@storm.incubator.apache.org Message-Id: <1730D15D-03DB-4108-AA9B-F8478FE012F0@gmail.com> References: <66318FBB-816D-47E6-B216-177063067C52@gmail.com> <20140213161036.GA49676@animetrics.com> To: user@storm.incubator.apache.org X-Mailer: Apple Mail (2.1827) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_6143E698-02FF-4F0D-BC11-5B83C12C73EE Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Yeah, we had to disable issues on nathanmarz/storm in order to get = everyone to start using the Apache infrastructure. An unfortunate side = effect of that was that now links to old github issues return a 404. All the github issues before the move have been migrated to JIRA now = (https://issues.apache.org/jira/browse/STORM). You may have to do some = searching to find them, but James Xu did a lot of work to ensure that = most of the comments got transferred to JIRA. I=92ll consolidate them under a single JIRA when I get the chance, but = the relevant issues are: STORM-202, STORM-105, STORM-29, STORM-143, and STORM-95 Or use this query: = https://issues.apache.org/jira/browse/STORM-202?jql=3Dproject%20%3D%20STOR= M%20AND%20text%20~%20%22stdout%22 It would be great if we could come up with a proper solution. For those = who have been bitten by this it=92s painful =97 zero indication of what = the issue is. At the very least, it should be well documented. Heisenbugs suck. ;) - Taylor On Feb 13, 2014, at 11:10 AM, Marc Vaillant = wrote: > I was going to mention something similar. The stdout/stderr issue is > insidious. Only happens in cluster mode and while it was fixed for = some > stdout/stderr pathways, others have not been fixed AFAIK. Taylor has > mentioned one, another is stdout/stderr coming from native code via = JNI > (i.e. printf, std::cout, std::cerr from C and C++). Unfortunately, = I > seem to be unable to access the original issue thread from github that = I > tried to reopen about 9 months ago.=20 >=20 > Marc >=20 > On Wed, Feb 12, 2014 at 03:50:52PM -0500, P. Taylor Goetz wrote: >> It looks like you have GC logging turned on, but you haven t = specified an >> output file: >>=20 >> "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails >> -Dcom.sun.management.jmxremote = -Dcom.sun.management.jmxremote.ssl=3Dfalse >> -Dcom.sun.management.jmxremote.authenticate=3Dfalse >> -Dcom.sun.management.jmxremote.port=3D1%ID% -Xmx1200m=20 >>=20 >> I ve run into this before: >>=20 >> "The root cause was that GC logging for workers had been turned on = without >> specifying an output file. As a result the GC logging went to = standard output >> without being redirected to logback. Eventually the buffer filled and = the JVM >> would hang (and thus stop heartbeating) and be killed. The amount of = time the >> worker would last depended on memory pressure and allocated heap size >> (obviously, in hindsight the more GC, the more GC logging, the = faster the >> buffer fills). >>=20 >> The symptoms were workers timing out and being killed for no apparent = reason.=20 >>=20 >> I would try with GC logging turned off or redirected to a file. >>=20 >> - Taylor >>=20 >> On Feb 12, 2014, at 3:12 PM, Mark Hu wrote: >>=20 >>=20 >> Hi guys, >>=20 >>=20 >> Just an update. >>=20 >> 1. After investigating the hang, it looked like the Spout was = properly >> sending tuples to a bolt, but the bolt would never acknowledge the = incoming >> tuple. After switching from using the Netty transport layer back = to 0MQ, >> recovery and reliability guarantees were restored. Please note = that we're >> using jruby-1.7.3 on top of storm java - could this cause issues = with >> netty? >> 2. Can anyone confirm if the above worker restart behaviour is = normal? >>=20 >> Thanks, >> Mark >>=20 >>=20 >>=20 >> On Feb 6, 2014, at 10:20 PM, Mark Hu wrote: >>=20 >>=20 >> HISTORY: >> I have past experience with Storm 0.8.2 and encountered silent = worker death >> during development. After experimenting with various parallelism, = worker, >> and max_spout configurations, we were able to find one that worked = well >> with high reliability despite this issue.=20 >>=20 >> The silent worker death basically involves a worker not responding = and >> waiting 30 seconds before starting to process work again. A full = load test >> reveals no data loss here. >>=20 >> I have this topology running well for the past 3 months and the = only errors >> were due to the user. >>=20 >> PRESENT: >> I am now testing Storm 0.9.0.1 and it appears the silent worker = death issue >> is still present, but now the topology hangs. Here are the = details: >>=20 >> Type and description for my test topology >> 1. One spout - makes web calls to a rest service to retrieve ids. >> 2. One bolt - matchRetrieveBolt - retrieves blobs for those ids. >> 3. 50 bolts - matchProcessBolt - does some processing on the = blobs. >>=20 >> Storm performed swimmingly for about 63,000 blobs before it hung. = After >> examining the logs, it appears that one worker was processing both = bolt >> type 2 and bolt type 3. For some reason, a reset occurs and the = bolts >> reinitialize but become unresponsive. The spout continues to send = work but >> times out and retries. >>=20 >> Here is the data for worker-6710.log: >>=20 >>=20 >> 2014-02-07 01:31:26 MatchProcessBolt [INFO] = [BOLT-MatchProcessBolt] bitfield already allocated, ttl: -1 >> 2014-02-07 01:31:26 MatchProcessBolt [INFO] = [BOLT-MatchProcessBolt] All done. Acking now >> 2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] process: = first =3D 66300, last =3D 66350 >> 2014-02-07 01:31:26 MatchRetrieveBolt [INFO] [BOLT-1] url: SOMEURL >>=20 >>=20 >> As per the above, this worker is performing bolt1 and bolt2 work. >>=20 >> Suddenly, a restart occurs as per the below: >>=20 >>=20 >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:zookeeper.version=3D3.3.3-1073969, built on 02/23/2011 22:27 = GMT >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:host.name=3Ddev-shared2-supervisor-i-e67015c7 >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.version=3D1.6.0_27 >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.vendor=3DSun Microsystems Inc. >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.home=3D/usr/lib/jvm/java-6-openjdk-amd64/jre >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.class.path=3D/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1= .jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm= /storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0= .9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/stor= m-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framewor= k-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/sto= rm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj= -time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/stor= m/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/too= ls.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/op= t/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib= /servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0= .jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/= lib/commo> = ns-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0.jar:/op= t/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0.1/lib/n= etty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete-0.2.3.j= ar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/storm-0.9.0= .1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring-servlet= -0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/storm-0.9.= 0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jetty-adapt= er-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/opt/storm= /storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.0.1/lib/= clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.jar:/opt= /storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/storm-0.9.0= .1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar:/opt/st= orm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/lib/kryo-2= .17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.11.jar:/o> = pt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0.1/lib/c= ompojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.jar:/opt= /storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0.1/lib/s= lf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:/opt/sto= rm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/cu= rator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.jar:/opt/= storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.9.0.1/li= b/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1.1.jar:/= opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/storm-0.9.= 0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.macro-0.= 1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/storm/storm= -0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.numeric-t= ower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor/stormdi= st/match_parser-12-1391735866/stormjar.jar >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.library.path=3D/usr/local/lib:/opt/local/lib:/usr/lib >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.io.tmpdir=3D/tmp >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:java.compiler=3D >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:os.name=3DLinux >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:os.arch=3Damd64 >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:os.version=3D3.2.0-48-virtual >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:user.name=3Dstorm >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:user.home=3D/home/storm >> 2014-02-07 01:32:00 o.a.z.ZooKeeper [INFO] Client = environment:user.dir=3D/opt/storm/storm-0.9.0.1 >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:zookeeper.version=3D3.3.3-1073969, built on 02/23/2011 22:27 = GMT >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:host.name=3Ddev-shared2-supervisor-i-e67015c7 >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.version=3D1.6.0_27 >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.vendor=3DSun Microsystems Inc. >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.home=3D/usr/lib/jvm/java-6-openjdk-amd64/jre >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.class.path=3D/opt/storm/storm-0.9.0.1/storm-netty-0.9.0.1= .jar:/opt/storm/storm-0.9.0.1/storm-console-logging-0.9.0.1.jar:/opt/storm= /storm-0.9.0.1/storm-core-0.9.0.1.jar:/opt/storm/storm-0.9.0.1/lib/jline-0= .9.94.jar:/opt/storm/storm-0.9.0.1/lib/servlet-api-2.5.jar:/opt/storm/stor= m-0.9.0.1/lib/minlog-1.2.jar:/opt/storm/storm-0.9.0.1/lib/curator-framewor= k-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/snakeyaml-1.11.jar:/opt/storm/sto= rm-0.9.0.1/lib/log4j-over-slf4j-1.6.6.jar:/opt/storm/storm-0.9.0.1/lib/clj= -time-0.4.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-core-1.1.5.jar:/opt/stor= m/storm-0.9.0.1/lib/mockito-all-1.9.5.jar:/opt/storm/storm-0.9.0.1/lib/too= ls.logging-0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/disruptor-2.10.1.jar:/op= t/storm/storm-0.9.0.1/lib/zookeeper-3.3.3.jar:/opt/storm/storm-0.9.0.1/lib= /servlet-api-2.5-20081211.jar:/opt/storm/storm-0.9.0.1/lib/carbonite-1.5.0= .jar:/opt/storm/storm-0.9.0.1/lib/guava-13.0.jar:/opt/storm/storm-0.9.0.1/= l> = ib/commons-fileupload-1.2.1.jar:/opt/storm/storm-0.9.0.1/lib/joda-time-2.0= .jar:/opt/storm/storm-0.9.0.1/lib/jgrapht-0.8.3.jar:/opt/storm/storm-0.9.0= .1/lib/netty-3.6.3.Final.jar:/opt/storm/storm-0.9.0.1/lib/clojure-complete= -0.2.3.jar:/opt/storm/storm-0.9.0.1/lib/tools.cli-0.2.2.jar:/opt/storm/sto= rm-0.9.0.1/lib/logback-classic-1.0.6.jar:/opt/storm/storm-0.9.0.1/lib/ring= -servlet-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/asm-4.0.jar:/opt/storm/st= orm-0.9.0.1/lib/commons-exec-1.1.jar:/opt/storm/storm-0.9.0.1/lib/ring-jet= ty-adapter-0.3.11.jar:/opt/storm/storm-0.9.0.1/lib/commons-lang-2.5.jar:/o= pt/storm/storm-0.9.0.1/lib/commons-logging-1.1.1.jar:/opt/storm/storm-0.9.= 0.1/lib/clout-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/clj-stacktrace-0.2.2.= jar:/opt/storm/storm-0.9.0.1/lib/reflectasm-1.07-shaded.jar:/opt/storm/sto= rm-0.9.0.1/lib/junit-3.8.1.jar:/opt/storm/storm-0.9.0.1/lib/jzmq-2.1.0.jar= :/opt/storm/storm-0.9.0.1/lib/jetty-6.1.26.jar:/opt/storm/storm-0.9.0.1/li= b/kryo-2.17.jar:/opt/storm/storm-0.9.0.1/lib/ring-devel-0.3.1> = 1.jar:/opt/storm/storm-0.9.0.1/lib/hiccup-0.3.6.jar:/opt/storm/storm-0.9.0= .1/lib/compojure-1.1.3.jar:/opt/storm/storm-0.9.0.1/lib/jetty-util-6.1.26.= jar:/opt/storm/storm-0.9.0.1/lib/commons-io-1.4.jar:/opt/storm/storm-0.9.0= .1/lib/slf4j-api-1.6.5.jar:/opt/storm/storm-0.9.0.1/lib/clojure-1.4.0.jar:= /opt/storm/storm-0.9.0.1/lib/tools.nrepl-0.2.3.jar:/opt/storm/storm-0.9.0.= 1/lib/curator-client-1.0.1.jar:/opt/storm/storm-0.9.0.1/lib/httpcore-4.1.j= ar:/opt/storm/storm-0.9.0.1/lib/logback-core-1.0.6.jar:/opt/storm/storm-0.= 9.0.1/lib/commons-codec-1.4.jar:/opt/storm/storm-0.9.0.1/lib/json-simple-1= .1.jar:/opt/storm/storm-0.9.0.1/lib/core.incubator-0.1.0.jar:/opt/storm/st= orm-0.9.0.1/lib/libthrift7-0.7.0-2.jar:/opt/storm/storm-0.9.0.1/lib/tools.= macro-0.1.0.jar:/opt/storm/storm-0.9.0.1/lib/httpclient-4.1.1.jar:/opt/sto= rm/storm-0.9.0.1/lib/objenesis-1.2.jar:/opt/storm/storm-0.9.0.1/lib/math.n= umeric-tower-0.0.1.jar:/opt/storm/storm-0.9.0.1/conf:/mnt/storm/supervisor= /stormdist/match_parser-12-1391735866/stormjar.jar >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.library.path=3D/usr/local/lib:/opt/local/lib:/usr/lib >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.io.tmpdir=3D/tmp >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:java.compiler=3D >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:os.name=3DLinux >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:os.arch=3Damd64 >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:os.version=3D3.2.0-48-virtual >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:user.name=3Dstorm >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:user.home=3D/home/storm >> 2014-02-07 01:32:00 o.a.z.s.ZooKeeperServer [INFO] Server = environment:user.dir=3D/opt/storm/storm-0.9.0.1 >> 2014-02-07 01:32:01 b.s.d.worker [INFO] Launching worker for = match_parser-12-1391735866 on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 = with id 80483ed6-2c51-4f86-9d06-9d2bed705748 and conf = {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", = "topology.tick.tuple.freq.secs" nil, = "topology.builtin.metrics.bucket.size.secs" 60, = "topology.fall.back.on.java.serialization" true, = "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, = "topology.skip.missing.kryo.registrations" false, = "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" = "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" = true, "topology.trident.batch.emit.interval.millis" 500, = "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", = "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", = "topology.executor.send.buffer.size" 1024, "storm.local.dir" = "/mnt/storm", "storm.messaging.netty.buffer_size" 5242880, = "supervisor.worker.start.timeout.secs" 120, = "topology.enable.message.timeouts" true, "nim> = bus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" = 3600, "drpc.worker.threads" 64, = "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "10.0.5.50", = "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2181, = "transactional.zookeeper.port" nil, = "topology.executor.receive.buffer.size" 1024, = "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", = "storm.zookeeper.retry.intervalceiling.millis" 30000, = "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" = 1, "storm.zookeeper.servers" ["10.0.6.131" "10.0.6.130" "10.0.6.55"], = "transactional.zookeeper.root" "/transactional", = "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, = "topology.worker.childopts" nil, "drpc.queue.size" 128, = "worker.childopts" "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails -Dcom.sun.management.jmxremote = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D1%ID% -Xmx1200m", "sup> = ervisor.heartbeat.frequency.secs" 5, = "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" = 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts" = "-Xmx768m", "topology.receiver.buffer.size" 8, = "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, = "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy" = "backtype.storm.spout.SleepSpoutWaitStrategy", = "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, = "topology.sleep.spout.wait.strategy.time.ms" 1, = "nimbus.topology.validator" = "backtype.storm.nimbus.DefaultTopologyValidator", = "supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708 = 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721], = "topology.debug" false, "nimbus.task.launch.secs" 120, = "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, = "task.refresh.poll.secs" 10, "topology.workers" 1, = "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.> = sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D8091", "nimbus.thrift.port" 6627, = "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, = "topology.tuple.serializer" = "backtype.storm.serialization.types.ListDelegateSerializer", = "topology.disruptor.wait.strategy" = "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" = 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" = "backtype.storm.serialization.DefaultKryoFactory", = "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, = "storm.zookeeper.retry.times" 5, "storm.thrift.transport" = "backtype.storm.security.auth.SimpleTransportPlugin", = "topology.state.synchronization.timeout.secs" 60, = "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" = 600, "storm.messaging.transport" = "backtype.storm.messaging.netty.Context", "logviewer.appender.name" = "A1", "storm.messaging.netty.max_wait_ms" 1000, = "drpc.request.timeout.secs> " 600, "storm.local.mode.zmq" false, = "ui.port" 8080, "nimbus.childopts" "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails -Dcom.sun.management.jmxremote = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D8090", "storm.cluster.mode" = "distributed", "topology.optimize" true, "topology.max.task.parallelism" = nil} >> 2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client = connection, connectString=3D10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181= sessionTimeout=3D20000 = watcher=3Dcom.netflix.curator.ConnectionState@25de152f >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket = connection to server /10.0.6.131:2181 >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection = established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181, initiating = session >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment = complete on server ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = =3D 0xf343fa0ee4d600f0, negotiated timeout =3D 20000 >> 2014-02-07 01:32:01 b.s.zookeeper [INFO] Zookeeper state update: = :connected:none >> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Session: = 0xf343fa0ee4d600f0 closed >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] EventThread shut down >> 2014-02-07 01:32:01 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting >> 2014-02-07 01:32:01 o.a.z.ZooKeeper [INFO] Initiating client = connection, = connectString=3D10.0.6.131:2181,10.0.6.130:2181,10.0.6.55:2181/storm = sessionTimeout=3D20000 = watcher=3Dcom.netflix.curator.ConnectionState@31930611 >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Opening socket = connection to server /10.0.6.131:2181 >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Socket connection = established to ip-10-0-6-131.ec2.internal/10.0.6.131:2181, initiating = session >> 2014-02-07 01:32:01 o.a.z.ClientCnxn [INFO] Session establishment = complete on server ip-10-0-6-131.ec2.internal/10.0.6.131:2181, sessionid = =3D 0xf343fa0ee4d600f1, negotiated timeout =3D 20000 >> 2014-02-07 01:32:01 b.s.m.TransportFactory [INFO] Storm peer = transport plugin:backtype.storm.messaging.netty.Context >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 2:[2 2] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks = 2:[2 2] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading = executor 2:[2 2] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor = __acker:[102 102] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 2:(2) >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks = __acker:[102 102] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Timeouts disabled for = executor __acker:[102 102] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt = __acker:(102) >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading = executor __acker:[102 102] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt = __acker:(102) >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor 3:[52 = 52] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks = 3:[52 52] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading = executor 3:[52 52] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt 3:(52) >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loading executor = __system:[-1 -1] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Loaded executor tasks = __system:[-1 -1] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Finished loading = executor __system:[-1 -1] >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Preparing bolt = __system:(-1) >> 2014-02-07 01:32:01 b.s.d.worker [INFO] Launching receive-thread = for f1b66213-7237-483e-a5db-e543bb36bcd9:6710 >> 2014-02-07 01:32:01 b.s.d.executor [INFO] Prepared bolt = __system:(-1) >> 2014-02-07 01:32:02 b.s.d.worker [INFO] Worker has topology config = {"storm.id" "match_parser-12-1391735866", "dev.zookeeper.path" = "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, = "topology.builtin.metrics.bucket.size.secs" 60, = "topology.fall.back.on.java.serialization" true, = "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, = "topology.skip.missing.kryo.registrations" false, = "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" = "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" = true, "topology.trident.batch.emit.interval.millis" 500, = "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", = "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", = "executor.send.buffer.size" 16384, "topology.executor.send.buffer.size" = 1024, "storm.local.dir" "/mnt/storm", "iwparser_socket" = "/tmp/iwparser2Sockets/iwparserClients.ipc", = "storm.messaging.netty.buffer_size" 5242880, = "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.t> = imeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "batch_size" 50, = "nimbus.inbox.jar.expiration.secs" 3600, = "aws_creds.aws_secret_access_key" = "wmGLQ4ItvdhniRFMJe3yDJvBwTJUTqbuLYAS5ZPU", "drpc.worker.threads" 64, = "redis.processed.match.log.size" 100000, "redis.key.match.ok" = "MATCH-OK", "topology.worker.shared.thread.pool.size" 4, "nimbus.host" = "10.0.5.50", "storm.messaging.netty.min_wait_ms" 100, = "storm.zookeeper.port" 2181, "transactional.zookeeper.port" nil, = "topology.executor.receive.buffer.size" 1024, = "aws_creds.aws_access_key_id" "AKIAJP4O6MTXON3MLV2Q", = "executor.receive.buffer.size" 16384, "transactional.zookeeper.servers" = nil, "storm.zookeeper.root" "/storm", "aws.s3.match_bucket_name" = "prod-matches", "storm.zookeeper.retry.intervalceiling.millis" 30000, = "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" = 1, "storm.zookeeper.servers" ["10.0.6.131" "10.0.6.130" "10.0.6.55"], = "transactional.zookeeper.root" "/transactional", = "topology.acker.executors" 50, "topology.kryo.decorators" > (), = "topology.name" "match_parser", "topology.transfer.buffer.size" 1024, = "topology.worker.childopts" "-Djruby.compat.version=3DRUBY1_9", = "drpc.queue.size" 128, "worker.childopts" "-verbose:gc = -XX:+PrintGCTimeStamps -XX:+PrintGCDetails = -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D1%ID% -Xmx1200m", = "supervisor.heartbeat.frequency.secs" 5, "transfer.buffer.size" 32, = "topology.error.throttle.interval.secs" 10, "throttle.spout" 1, = "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, = "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, = "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, = "storm.messaging.netty.max_retries" 100, "topology.spout.wait.strategy" = "backtype.storm.spout.SleepSpoutWaitStrategy", "receiver.buffer.size" 8, = "debug.logging.redis" true, "topology.max.spout.pending" 1, = "storm.zookeeper.retry.interval" 1000, = "topology.sleep.spout.wait.strategy.time.ms" 1> , "platform" "xbox", = "nimbus.topology.validator" = "backtype.storm.nimbus.DefaultTopologyValidator", = "supervisor.slots.ports" [6700 6701 6702 6703 6704 6705 6706 6707 6708 = 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721], = "iwparser_port" 12330, "redis.port" 6379, "topology.debug" false, = "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, = "debug.logging.stdout" true, "topology.kryo.register" nil, = "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, = "dw_retries.max_attempts" 6, "topology.workers" 50, = "supervisor.childopts" "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails -Dcom.sun.management.jmxremote = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D8091", "nimbus.thrift.port" 6627, = "redis_prefix" "prod", "topology.stats.sample.rate" 0.05, = "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" = "backtype.storm.serialization.types.ListDelegateSerializer", = "topology.disruptor.w> ait.strategy" = "com.lmax.disruptor.BlockingWaitStrategy", "http_proxy" "proxy", = "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" = 15000, "topology.kryo.factory" = "backtype.storm.serialization.DefaultKryoFactory", = "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, = "storm.zookeeper.retry.times" 5, "storm.thrift.transport" = "backtype.storm.security.auth.SimpleTransportPlugin", = "topology.state.synchronization.timeout.secs" 60, = "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" = 600, "storm.messaging.transport" = "backtype.storm.messaging.netty.Context", "logviewer.appender.name" = "A1", "storm.messaging.netty.max_wait_ms" 1000, "debug.logging.kibana" = false, "iwparser_host" "10.200.6.145", "drpc.request.timeout.secs" 600, = "demonware.endpoint" "http://10.0.1.78:9000/matches/", = "redis_simple_prefix" "prod", "redis.host" = "storm090-redis.uvvujo.0001.use1.cache.amazonaws.com", = "dw_retries.wait_interval" 2, "storm.local.mode.zmq" false, "ui.port" = 8080, "nimbus.childopts> " "-verbose:gc -XX:+PrintGCTimeStamps = -XX:+PrintGCDetails -Dcom.sun.management.jmxremote = -Dcom.sun.management.jmxremote.ssl=3Dfalse = -Dcom.sun.management.jmxremote.authenticate=3Dfalse = -Dcom.sun.management.jmxremote.port=3D8090", "storm.cluster.mode" = "distributed", "topology.optimize" true, "topology.max.task.parallelism" = nil, "statsd.endpoint" "statsd", "redis.log.size" 100000} >> 2014-02-07 01:32:02 b.s.d.worker [INFO] Worker = 80483ed6-2c51-4f86-9d06-9d2bed705748 for storm = match_parser-12-1391735866 on f1b66213-7237-483e-a5db-e543bb36bcd9:6710 = has finished loading >> 2014-02-07 01:32:08 MatchRetrieveBolt [INFO] [BOLT-1] on_init >> 2014-02-07 01:32:08 MatchProcessBolt [INFO] = [BOlT-MatchProcessBolt] on_init >> 2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 2:(2) >> 2014-02-07 01:32:08 b.s.d.executor [INFO] Prepared bolt 3:(52) >>=20 >>=20 >> The bolts have reinitialized. The Storm UI is showing that bolt1 = is still being handled by this specific worker. >>=20 >>=20 >>=20 >> The spout still sends tuples to matchRetrieveBolt but never gets a = response. The failure is caught and retries continue. >>=20 >>=20 >> Can anyone comment or provide any thoughts on the above issue? >>=20 >> Thanks, >> Mark >>=20 >>=20 >=20 >=20 --Apple-Mail=_6143E698-02FF-4F0D-BC11-5B83C12C73EE Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJS/YDGAAoJEI3gOWLoC4/9ELwIANlOb7E8nsD5R4wHkZQBwbkg GZ5YLjgvppKf8qq9+Eybp7rUYvy1Y5tbEGCb+TcDnr0vcYK0n7EkR1fuonw/tG6O zaUb7yFoEzfSgEOB4NTgsEgp5YBO4M/ITm3nbZ3zIxc50GguHuYEbQweRZs3hZUb jN+CffEoZVlRsbV5t+3lEKqMBo9Lmg3eZyRUOhfEYolMeqUiq7RQ9iReQm4tESzb Tcqy36BfMfuCzjTCl0ztO7hh6EKCyKVZnQun1Yt1zqTIYoeq4715oJwmb9GYAjgG MHjY6oJk2JBtdaTtR8OAEp5feiv2qDbitzkWDCEdsHbPLWEDpZZcQTT+jBncF94= =o7fr -----END PGP SIGNATURE----- --Apple-Mail=_6143E698-02FF-4F0D-BC11-5B83C12C73EE--