From user-return-1692-apmail-storm-user-archive=storm.apache.org@storm.incubator.apache.org Wed Apr 2 12:57:42 2014 Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B401410490 for ; Wed, 2 Apr 2014 12:57:42 +0000 (UTC) Received: (qmail 5467 invoked by uid 500); 2 Apr 2014 12:57:42 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 5055 invoked by uid 500); 2 Apr 2014 12:57:41 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 5045 invoked by uid 99); 2 Apr 2014 12:57:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Apr 2014 12:57:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of d.mayorova@gmail.com designates 209.85.212.170 as permitted sender) Received: from [209.85.212.170] (HELO mail-wi0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Apr 2014 12:57:34 +0000 Received: by mail-wi0-f170.google.com with SMTP id bs8so6487044wib.5 for ; Wed, 02 Apr 2014 05:57:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=6WQcgx0agA4jGpkZhZXRAPiRFEjR75EqnuQAERAoliM=; b=sAnV+eE+WFDZwYflQx/fq4Rlza1vWLwJLksNBoRshqiJVkDqRHYX5WEvEIvrzdkOis +j6/00jh3/TFzx/DzkAhsvzqzEbsYybwYRjWGopoMgwAoiOZf3Q2LTeKjfT1dXxXEyPR 2eFcTFzmMPeD90bpb/8dFSw0jP91cVgxJz1GYCHbMabV58Zkq3jSwKyb8/wYZBB3HF8F /FRc7czegpDeyE74Qnwnw4nYmUDJE9nYB1BECn+VvHtg8Hroi61jNbB/tivv2HUQcn38 rxM8ih1cPwv/P1Z1yQb+ynXbua8iOrTi4Ku+ZWJ+RheDcHnaLlcnVLu0geYuCSNeEuEk F69g== X-Received: by 10.194.87.163 with SMTP id az3mr433123wjb.63.1396443432861; Wed, 02 Apr 2014 05:57:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.37.3 with HTTP; Wed, 2 Apr 2014 05:56:52 -0700 (PDT) From: Daria Mayorova Date: Wed, 2 Apr 2014 14:56:52 +0200 Message-ID: Subject: Tuples lost in Storm 0.9.1 To: user@storm.incubator.apache.org Content-Type: multipart/alternative; boundary=089e010d8814c3c51404f60ed28a X-Virus-Checked: Checked by ClamAV on apache.org --089e010d8814c3c51404f60ed28a Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, We are having some issues with the Storm topology. The problem is that some tuples are being lost somewhere in the topology. Just after the topology is deployed, it goes pretty well, but after several hours it starts to loose a significant amount of tuples. >From what we've found out from the logs, the thing is that the tuples exit one bolt/spout, and never enter the next bolt. Here is some info about the topology: - The version is 0.9.1, and netty is used as transport - The spout is extending BaseRichSpout, and the bolts extend BaseBasicBolt - The spout is using Kestrel message queue - The cluster consists of 2 nodes: zookeeper, nimbus and ui are running on one node, and the workers run on another node. I am attaching the content of the config files below. We have also tried running the workers on another node (the same where nimbus and zookeeper are), and also on both nodes, but the behavior is the same. According to the Storm UI there are no Failed tuples. Can anybody give any idea of what might be the reason of the tuples getting lost? Thanks. *Storm config (storm.yaml)* (In case both nodes have workers running, the configuration is the same on both nodes, just the "storm.local.hostname" parameter changes) storm.zookeeper.servers: - "zkserver1" nimbus.host: "nimbusserver" storm.local.dir: "/mnt/storm" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 storm.local.hostname: "storm1server" nimbus.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" ui.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true" supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" worker.childopts: "-Xmx3548m -Djava.net.preferIPv4Stack=true" storm.cluster.mode: "distributed" storm.local.mode.zmq: false storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin" storm.messaging.transport: "backtype.storm.messaging.netty.Context" storm.messaging.netty.server_worker_threads: 1 storm.messaging.netty.client_worker_threads: 1 storm.messaging.netty.buffer_size: 5242880 #5MB buffer storm.messaging.netty.max_retries: 30 storm.messaging.netty.max_wait_ms: 1000 storm.messaging.netty.min_wait_ms: 100 *Zookeeper config (zoo.cfg):* tickTime=2000 initLimit=10 syncLimit=5 dataDir=/var/zookeeper clientPort=2181 autopurge.purgeInterval=24 autopurge.snapRetainCount=5 server.1=localhost:2888:3888 *Topology configuration* passed to the StormSubmitter: Config conf = new Config(); conf.setNumAckers(6); conf.setNumWorkers(4); conf.setMaxSpoutPending(100); Best regards, Daria Mayorova --089e010d8814c3c51404f60ed28a Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi everyone,

We are having some issues with the Storm topology. The = problem is that some tuples are being lost somewhere in the topology. Just = after the topology is deployed, it goes pretty well, but after several hour= s it starts to loose a significant amount of tuples.

From what we've found out from the logs, the thing is that the tupl= es exit one bolt/spout, and never enter the next bolt.

Here is some = info about the topology:
  • The version is 0.9.1, and netty is used= as transport
  • The spout is extending BaseRichSpout, and the bolts extend BaseBas= icBolt
  • The spout is using Kestrel message queue
  • The= cluster consists of 2 nodes: zookeeper, nimbus and ui are running on one n= ode, and the workers run on another node. I am attaching the content of the= config files below. We have also tried running the workers on another node= (the same where nimbus and zookeeper are), and also on both nodes, but the= behavior is the same.
According to the Storm UI there are no Failed tuples. Can anybody give= any idea of what might be the reason of the tuples getting lost?=A0
Thanks.

Storm config (storm.yaml)
(In c= ase both nodes have workers running, the configuration is the same on both = nodes, just the "storm.local.hostname" parameter changes)

storm.zookeeper.servers:
=A0 =A0 =A0- "zkserver1"
nimbu= s.host: "nimbusserver"
storm.local.dir: "/mnt/storm"=
supervisor.slots.ports:
=A0 =A0 - 6700
=A0 =A0 - 6701
=A0 =A0 = - 6702
=A0 =A0 - 6703
storm.local.hostname: "storm1server"

nimbus.childopts: &qu= ot;-Xmx1024m -Djava.net.preferIPv4Stack=3Dtrue"
ui.childopts: "= ;-Xmx768m -Djava.net.preferIPv4Stack=3Dtrue"
supervisor.childopts: = "-Xmx1024m -Djava.net.preferIPv4Stack=3Dtrue"
worker.childopts: "-Xmx3548m -Djava.net.preferIPv4Stack=3Dtrue"
storm.cluster.mode: "distributed"
storm.local.mode.zmq: = false
storm.thrift.transport: "backtype.storm.security.auth.SimpleT= ransportPlugin"

storm.messaging.transport: "backtype.storm.messaging.netty.Context= "

storm.messaging.netty.server_worker_threads: 1
stor= m.messaging.netty.client_worker_threads: 1
storm.messaging.netty.buffer_= size: 5242880 #5MB buffer
storm.messaging.netty.max_retries: 30
storm.messaging.netty.max_wait_ms:= 1000
storm.messaging.netty.min_wait_ms: 100

Zookeeper = config (zoo.cfg):
tickTime=3D2000
initLimit=3D10
syncLimit=3D5=
dataDir=3D/var/zookeeper
clientPort=3D2181
autopurge.purgeInterval=3D= 24
autopurge.snapRetainCount=3D5
server.1=3Dlocalhost:2888:3888

Topology configuration passed to= the StormSubmitter:
Config conf =3D new Config();
conf.setNumAckers(6);
conf.setNumWorkers(4);
conf.setMaxSpoutPending(100);
=


Best regards,
Daria Mayo= rova
--089e010d8814c3c51404f60ed28a--