Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3F0B103FF for ; Thu, 10 Apr 2014 08:39:57 +0000 (UTC) Received: (qmail 54532 invoked by uid 500); 10 Apr 2014 08:39:56 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 54486 invoked by uid 500); 10 Apr 2014 08:39:53 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 54476 invoked by uid 99); 10 Apr 2014 08:39:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Apr 2014 08:39:52 +0000 X-ASF-Spam-Status: No, hits=2.1 required=5.0 tests=HK_RANDOM_ENVFROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jasonjckn@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qc0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Apr 2014 08:39:47 +0000 Received: by mail-qc0-f169.google.com with SMTP id i17so4032117qcy.14 for ; Thu, 10 Apr 2014 01:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=WLcU+dzW2lwBqGjnruaqZS5cwXP0OCJrZfyqk7wVxWE=; b=zMI1XB82vr0ZsmUVgk/btL/xVQ4MtnPdr9zf3pA7Me2Jaffdw4z7qy+pYG5RDEiRG0 547hs8191tCUp7LDxENzwAJrqTNsNyyCIsLUZpcxzWzktw6uW/twvj+75cTxpUaHiMhV eQ8LkxAFTIUywB7BG5IrapBtPaQniNTmu8lIKzTf5o8HuO3ZX+zrf2In+ehkKcFpvVWB OP5tMEhg9exiagCPvz8IGakNzSP400TheCnkn4hjr/+h85InxACtK5XVioi7Gqd/et+f QQPxF5JOj7qxN2RNDR538DHNDRQM0yeH6WPixq70pny7M/s72vAYYwAD3W0N8tTMRj1V dpYA== X-Received: by 10.224.14.14 with SMTP id e14mr9000062qaa.80.1397119164982; Thu, 10 Apr 2014 01:39:24 -0700 (PDT) MIME-Version: 1.0 Received: by 10.96.151.167 with HTTP; Thu, 10 Apr 2014 01:39:03 -0700 (PDT) In-Reply-To: References: From: Jason Jackson Date: Thu, 10 Apr 2014 01:39:03 -0700 Message-ID: Subject: Re: Topology is stuck To: user Content-Type: multipart/alternative; boundary=60eb69fdf4ef89cfec04f6ac274d X-Virus-Checked: Checked by ClamAV on apache.org --60eb69fdf4ef89cfec04f6ac274d Content-Type: text/plain; charset=ISO-8859-1 My idea for the bug was that trident expects to read from zookeeper what was recently written zookeeper for the same zknode, and due to sequential consistency it sometimes reads an older value even though it just wrote a newer value. I could be way off the mark though, it's just an idea to explore more. On Thu, Apr 10, 2014 at 1:36 AM, Jason Jackson wrote: > Hi Ted, thanks for clearing up the language, I intended to express > sequential consistency then. > > Yes you could do a forced sync too, that would be another way good test. > > Taylor, the bug that I witnessed only occurs after you leave a trident > topology running for at least a day. One day it'll just stop making > progress and re-attempt the same batch forever. Unfortunately I can't send > the particular trident code to you, but I don't think there's anything > unique about it. I suspect any trident topology could reproduce the bug if > ran for a week. Other correlated factors may include that the trident > topology has to occasionally fail batches, the zookeeper cluster has to be > under significant load from other applications beyond trident. I don't many > much details unfortunately. > > -Jason > > > > > On Wed, Apr 9, 2014 at 3:03 PM, Ted Dunning wrote: > >> >> In what sense do you mean when you say that reads in ZK are eventually >> consistent? >> >> You may get a slightly old value, but you are guaranteed to see a >> consistent history. That is, if a value has values (which include version >> numbers) v_1 ... v_n, then if you see v_i, you will never see v_j where j> >> You can also guarantee that you don't even see delayed values by using >> sync. >> >> Normally when people say "eventually consistent" they mean that two >> participants can see inconsistent histories under partition. That isn't >> possible in ZK. As I understand it, ZK would be better described as >> providing sequential consistency since all observers will see all updates >> in the same order. >> >> >> >> >> On Wed, Apr 9, 2014 at 2:50 PM, Jason Jackson wrote: >> >>> I have one theory that because reads in zookeeper are eventually >>> consistent, this is a necessary condition for the bug to manifest. So one >>> way to test this hypothesis is to run a zookeeper ensemble with 1 node, or >>> a zookeeper ensemble configured for 5 nodes, but take 2 of them offline, so >>> that every write operation only succeeds if every member of the ensemble >>> sees the write. This should produce strong consistent reads. If you run >>> this test, let me know what the results are. (Clearly this isn't a good >>> production system though as you're making a tradeoff for lower availability >>> in favor of greater consistency, but the results could help narrow down the >>> bug) >>> >>> >>> On Wed, Apr 9, 2014 at 2:43 PM, Jason Jackson wrote: >>> >>>> Yah it's probably a bug in trident. It would be amazing if someone >>>> figured out the fix for this. I spent about 6 hours looking into, but >>>> couldn't figure out why it was occuring. >>>> >>>> Beyond fixing this, one thing you could do to buy yourself time is >>>> disable batch retries in trident. There's no option for this in the API, >>>> but it's like a 1 or 2 line change to the code. Obviously you loose exactly >>>> once semantics, but at least you would have a system that never falls >>>> behind real-time. >>>> >>>> >>>> >>>> On Wed, Apr 9, 2014 at 1:10 AM, Danijel Schiavuzzi < >>>> danijel@schiavuzzi.com> wrote: >>>> >>>>> Thanks Jason. However, I don't think that was the case in my stuck >>>>> topology, otherwise I'd have seen exceptions (thrown by my Trident >>>>> functions) in the worker logs. >>>>> >>>>> >>>>> On Wed, Apr 9, 2014 at 3:02 AM, Jason Jackson wrote: >>>>> >>>>>> An example of "corrupted input" that causes a batch to fail would be >>>>>> for example if you expected a schema to your data that you read off kafka, >>>>>> or some queue, and for whatever reason the data didn't conform to your >>>>>> schema and the function that you implement that you pass to stream.each >>>>>> throws an exception when this unexpected situation occurs. This would cause >>>>>> the batch to be retried, but it's deterministically failing, so the batch >>>>>> will be retried forever. >>>>>> >>>>>> >>>>>> On Mon, Apr 7, 2014 at 10:37 AM, Danijel Schiavuzzi < >>>>>> danijel@schiavuzzi.com> wrote: >>>>>> >>>>>>> Hi Jason, >>>>>>> >>>>>>> Could you be more specific -- what do you mean by "corrupted input"? >>>>>>> Do you mean that there's a bug in Trident itself that causes the tuples in >>>>>>> a batch to somehow become corrupted? >>>>>>> >>>>>>> Thanks a lot! >>>>>>> >>>>>>> Danijel >>>>>>> >>>>>>> >>>>>>> On Monday, April 7, 2014, Jason Jackson wrote: >>>>>>> >>>>>>>> This could happen if you have corrupted input that always causes a >>>>>>>> batch to fail and be retried. >>>>>>>> >>>>>>>> I have seen this behaviour before and I didn't see corrupted input. >>>>>>>> It might be a bug in trident, I'm not sure. If you figure it out please >>>>>>>> update this thread and/or submit a patch. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 31, 2014 at 7:39 AM, Danijel Schiavuzzi < >>>>>>>> danijel@schiavuzzi.com> wrote: >>>>>>>> >>>>>>>> To (partially) answer my own question -- I still have no idea on >>>>>>>> the cause of the stuck topology, but re-submitting the topology helps -- >>>>>>>> after re-submitting my topology is now running normally. >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi < >>>>>>>> danijel@schiavuzzi.com> wrote: >>>>>>>> >>>>>>>> Also, I did have multiple cases of my IBackingMap workers dying >>>>>>>> (because of RuntimeExceptions) but successfully restarting afterwards (I >>>>>>>> throw RuntimeExceptions in the BackingMap implementation as my strategy in >>>>>>>> rare SQL database deadlock situations to force a worker restart and to >>>>>>>> fail+retry the batch). >>>>>>>> >>>>>>>> From the logs, one such IBackingMap worker death (and subsequent >>>>>>>> restart) resulted in the Kafka spout re-emitting the pending tuple: >>>>>>>> >>>>>>>> 2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] >>>>>>>> re-emitting batch, attempt 29698959:736 >>>>>>>> >>>>>>>> This is of course the normal behavior of a transactional topology, >>>>>>>> but this is the first time I've encountered a case of a batch retrying >>>>>>>> indefinitely. This is especially suspicious since the topology has been >>>>>>>> running fine for 20 days straight, re-emitting batches and restarting >>>>>>>> IBackingMap workers quite a number of times. >>>>>>>> >>>>>>>> I can see in my IBackingMap backing SQL database that the batch >>>>>>>> with the exact txid value 29698959 has been committed -- but I suspect that >>>>>>>> could come from another BackingMap, since there are two BackingMap >>>>>>>> instances running (paralellismHint 2). >>>>>>>> >>>>>>>> However, I have no idea why the batch is being retried indefinitely >>>>>>>> now nor why it hasn't been successfully acked by Trident. >>>>>>>> >>>>>>>> Any suggestions on the area (topology component) to focus my >>>>>>>> research on? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> On Wed, Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi < >>>>>>>> danijel@schiavuzzi.com> wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> I'm having problems with my transactional Trident topology. It has >>>>>>>> been running fine for about 20 days, and suddenly is stuck processing a >>>>>>>> single batch, with no tuples being emitted nor tuples being persisted by >>>>>>>> the TridentState (IBackingMap). >>>>>>>> >>>>>>>> It's a simple topology which consumes messages off a Kafka queue. >>>>>>>> The spout is an instance of storm-kafka-0.8-plus >>>>>>>> TransactionalTridentKafkaSpout and I use the trident-mssql transactional >>>>>>>> TridentState implementation to persistentAggregate() data into a SQL >>>>>>>> database. >>>>>>>> >>>>>>>> In Zookeeper I can see Storm is re-trying a batch, i.e. >>>>>>>> >>>>>>>> "/transactional//coordinator/currattempts" is >>>>>>>> "{"29698959":6487}" >>>>>>>> >>>>>>>> ... and the attempt count keeps increasing. It seems the batch with >>>>>>>> txid 29698959 is stuck, as the attempt count in Zookeeper keeps increasing >>>>>>>> -- seems like the batch isn't being acked by Trident and I have no idea >>>>>>>> why, especially since the topology has been running successfully the last >>>>>>>> 20 days. >>>>>>>> >>>>>>>> I did rebalance the topology on one occasion, after which it >>>>>>>> continued running normally. Other than that, no other modifications were >>>>>>>> done. Storm is at version 0.9.0.1. >>>>>>>> >>>>>>>> Any hints on how to debug the stuck topology? Any other useful info >>>>>>>> I might provide? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -- >>>>>>>> Danijel Schiavuzzi >>>>>>>> >>>>>>>> E: danijel@schiavuzzi.com >>>>>>>> W: www.schiavuzzi.com >>>>>>>> T: +385989035562 >>>>>>>> Skype: danijel.schiavuzzi >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Danijel Schiavuzzi >>>>>>>> >>>>>>>> E: danije >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Danijel Schiavuzzi >>>>>>> >>>>>>> E: danijel@schiavuzzi.com >>>>>>> W: www.schiavuzzi.com >>>>>>> T: +385989035562 >>>>>>> Skype: danijels7 >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Danijel Schiavuzzi >>>>> >>>>> E: danijel@schiavuzzi.com >>>>> W: www.schiavuzzi.com >>>>> T: +385989035562 >>>>> Skype: danijels7 >>>>> >>>> >>>> >>> >> > --60eb69fdf4ef89cfec04f6ac274d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
My idea for the bug was that trident expects to read from = zookeeper what was recently written zookeeper for the same zknode, and due = to sequential consistency it sometimes reads an older value even though it = just wrote a newer value. I could be way off the mark though, it's just= an idea to explore more.=A0


On Thu, Apr 1= 0, 2014 at 1:36 AM, Jason Jackson <jasonjckn@gmail.com> wr= ote:
Hi Ted, thanks for clearing= up the language, I intended to express sequential consistency then.=A0
Yes you could do a forced sync too, that would be another way go= od test.=A0

Taylor, the bug that I witnessed only occurs after you leave a trident= topology running for at least a day. One day it'll just stop making pr= ogress and re-attempt the same batch forever. =A0Unfortunately I can't = send the particular trident code to you, but I don't think there's = anything unique about it. I suspect any trident topology could reproduce th= e bug if ran for a week. Other correlated factors may include that the trid= ent topology has to occasionally fail batches, the zookeeper cluster has to= be under significant load from other applications beyond trident. I don= 9;t many much details unfortunately.=A0

-Jason
<= br>



On Wed, Apr 9, 2014 at 3:03 PM, Ted Dunning &l= t;ted.dunning@gm= ail.com> wrote:

In what sense do y= ou mean when you say that reads in ZK are eventually consistent? =A0
<= div>
You may get a slightly old value, but you are guaranteed to = see a consistent history. =A0That is, if a value has values (which include = version numbers) v_1 ... v_n, then if you see v_i, you will never see v_j w= here j<i.

You can also guarantee that you don't even see dela= yed values by using sync.

Normally when people say= "eventually consistent" they mean that two participants can see = inconsistent histories under partition. =A0That isn't possible in ZK. = =A0As I understand it, ZK would be better described as providing sequential= consistency since all observers will see all updates in the same order.



On Wed, Apr 9, 2014 at 2:50 PM, Jason Jack= son <jasonjckn@gmail.com> wrote:
I have one theory that beca= use reads in zookeeper are eventually consistent, this is a necessary condi= tion for the bug to manifest. So one way to test this hypothesis is to run = a zookeeper ensemble with 1 node, or a zookeeper ensemble configured for 5 = nodes, but take 2 of them offline, so that every write operation only succe= eds if every member of the ensemble sees the write. This should produce str= ong consistent reads. If you run this test, let me know what the results ar= e. (Clearly this isn't a good production system though as you're ma= king a tradeoff for lower availability in favor of greater consistency, but= the results could help narrow down the bug)


On Wed, Apr 9= , 2014 at 2:43 PM, Jason Jackson <jasonjckn@gmail.com> wro= te:
Yah it's probably a bug= in trident. It would be amazing if someone figured out the fix for this. I= spent about 6 hours looking into, but couldn't figure out why it was o= ccuring.=A0

Beyond fixing this, one thing you could do to buy yourself t= ime is disable batch retries in trident. There's no option for this in = the API, but it's like a 1 or 2 line change to the code. Obviously you = loose exactly once semantics, but at least you would have a system that nev= er falls behind real-time.=A0



On Wed, Apr 9, 2014 at 1:10 AM, Danijel = Schiavuzzi <danijel@schiavuzzi.com> wrote:
Thanks Jason. However, I don't think that was the case= in my stuck topology, otherwise I'd have seen exceptions (thrown by my= Trident functions) in the worker logs.


On Wed, Apr 9, 2014 at 3:02 AM, Jason Ja= ckson <jasonjckn@gmail.com> wrote:
An example of "corrupted input" that causes a ba= tch to fail would be for example if you expected a schema to your data that= you read off kafka, or some queue, and for whatever reason the data didn&#= 39;t conform to your schema and the function that you implement that you pa= ss to stream.each throws an exception when this unexpected situation occurs= . This would cause the batch to be retried, but it's deterministically = failing, so the batch will be retried forever.=A0


On Mon, Apr 7= , 2014 at 10:37 AM, Danijel Schiavuzzi <danijel@schiavuzzi.com>= ; wrote:
Hi Jason,

Could you be mo= re=A0specific --=A0what do you mean by "corrupted input"? Do=A0yo= u mean that=A0there's=A0a bug in Trident itself=A0that causes the tuple= s in a batch=A0to somehow=A0become corrupted?

Thanks a lot!
<= br>
Danijel


On Monday, April 7, 2014, Jason Jackson <jasonjckn@gmail.com&g= t; wrote:
This could happen if you have corrupted input that always = causes a batch to fail and be retried.=A0

I have seen th= is behaviour before and I didn't see corrupted input. It might be a bug= in trident, I'm not sure. If you figure it out please update this thre= ad and/or submit a patch.=A0



On Mon, Mar 31, 2014 at 7:39 AM, Dan= ijel Schiavuzzi <danijel@schiavuzzi.com> wrote:
To (partially) answer my own question -- I still = have no idea on the cause of the stuck topology, but re-submitting the topo= logy helps -- after re-submitting my topology is now running normally.


On Wed, Mar 26, 2014 at 6:04 PM, Danijel Schiavuzzi <danijel@schiavuzzi.com> wrote:
Also, I did have multiple cases of my IBackingM= ap workers dying (because of RuntimeExceptions) but successfully restarting= afterwards (I throw RuntimeExceptions in the BackingMap implementation as = my strategy in rare SQL database deadlock situations to force a worker rest= art and to fail+retry the batch).

From the logs, one such IBackingMap worker death (and subseq= uent restart) resulted in the Kafka spout re-emitting the pending tuple:
=A0=A0=A0 2014-03-22 16:26:43 s.k.t.TridentKafkaEmitter [INFO] re-emit= ting batch, attempt 29698959:736

This is of course the normal behavior of a transa= ctional topology, but this is the first time I've encountered a case of= a batch retrying indefinitely. This is especially suspicious since the top= ology has been running fine for 20 days straight, re-emitting batches and r= estarting IBackingMap workers quite a number of times.

I can see in my IBackingMap backing SQL database that the batch with th= e exact txid value 29698959 has been committed -- but I suspect that could = come from another BackingMap, since there are two BackingMap instances runn= ing (paralellismHint 2).

However, I have no idea why the batch is being retried= indefinitely now nor why it hasn't been successfully acked by Trident.=

Any suggestions on the area (topology component) to focus my resear= ch on?

Thanks,

On Wed, = Mar 26, 2014 at 5:32 PM, Danijel Schiavuzzi <danije= l@schiavuzzi.com> wrote:
Hello,

I= 'm having problems with my transactional Trident topology. It has been = running fine for about 20 days, and suddenly is stuck processing a single b= atch, with no tuples being emitted nor tuples being persisted by the Triden= tState (IBackingMap).

It's a simple topology which consumes messages off a Kafka queue. T= he=20 spout is an instance of storm-kafka-0.8-plus=20 TransactionalTridentKafkaSpout and I use the trident-mssql transactional TridentState implementation to persistentAggregate() data into a SQL datab= ase.

In Zookeeper I can see Storm is re-trying a batch, i= .e.

=A0=A0=A0=A0 "/transactional/<myTopologyName>/coordin= ator/currattempts" is "{"29698959":6487}"

... and the attempt count keeps increasing. It se= ems the batch with txid 29698959 is stuck, as the attempt count in Zookeepe= r keeps increasing -- seems like the batch isn't being acked by Trident= and I have no idea why, especially since the topology has been running suc= cessfully the last 20 days.

I did rebalance the topology on one occasion, after which it= continued running normally. Other than that, no other modifications were d= one. Storm is at version 0.9.0.1.

Any hints on= how to debug the stuck topology? Any other useful info I might provide?

Thanks,

-- <= br>Danijel Schiavuzzi

E: = danijel@schiavuzzi.com
W: ww= w.schiavuzzi.com
T: +3859890= 35562
Skype: danijel.schiavuzzi



--
Danijel Sch= iavuzzi

E: danije<= /font>


--
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: ww= w.schiavuzzi.com
T: +3859890= 35562
Skype: danijels7

=
=


--
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: ww= w.schiavuzzi.com
T: +3859890= 35562
Skype: danijels7
=




--60eb69fdf4ef89cfec04f6ac274d--