Return-Path: X-Original-To: apmail-samza-dev-archive@minotaur.apache.org Delivered-To: apmail-samza-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E1E1A18850 for ; Thu, 12 Nov 2015 18:11:14 +0000 (UTC) Received: (qmail 8196 invoked by uid 500); 12 Nov 2015 18:11:14 -0000 Delivered-To: apmail-samza-dev-archive@samza.apache.org Received: (qmail 8144 invoked by uid 500); 12 Nov 2015 18:11:14 -0000 Mailing-List: contact dev-help@samza.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.apache.org Delivered-To: mailing list dev@samza.apache.org Received: (qmail 8081 invoked by uid 99); 12 Nov 2015 18:11:13 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 18:11:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 22B961A23AE for ; Thu, 12 Nov 2015 18:11:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.099 X-Spam-Level: X-Spam-Status: No, score=-0.099 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=chartbeat.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Q3e_iskwvLCm for ; Thu, 12 Nov 2015 18:10:57 +0000 (UTC) Received: from mail-qg0-f50.google.com (mail-qg0-f50.google.com [209.85.192.50]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 68E0543DAB for ; Thu, 12 Nov 2015 18:10:57 +0000 (UTC) Received: by qgeb1 with SMTP id b1so52684651qge.1 for ; Thu, 12 Nov 2015 10:10:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chartbeat.com; s=google; h=content-type:mime-version:subject:from:in-reply-to:date:message-id :references:to; bh=xg6p92LEMuXo8176r/Uiw4Td1SpF7Mn0Nur7FEWZpx4=; b=HC+7vYhUDKRmjOgDNDcPNJTeSwKp49LLiuBE3e3vSlNt65Dn+5eg8lsFWCabGYONrY s89XGfO7mSPFMaH7Bt1r6p/+PHqButjz5aaqnjX2zAlj2NqJ1MyFpJj8S5ov81hLXsa2 GbEuiibiQcZ3e/s7VUbfmJQ1bi3aOWNEv2c9U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:message-id:references:to; bh=xg6p92LEMuXo8176r/Uiw4Td1SpF7Mn0Nur7FEWZpx4=; b=dBODG5WLToJrFOfpCIf7qOYVo/J0ruIzK8JMhXt6UuXG4M+CR1kZ7FidjFfGhGi27m eP4kRANo8pGJ4NuGfsvk5iSmsNCfI7kHDZxe2zA6bFUnluC95ZOJ+4CoNiZVVmZSX4rr VEJiOD59pYqZeiaG1ptFiEsympQbkY/tvl7f7qVhhSgDHxXWc3g+IvM/IKwK0/JN1hpF tlU9Nsd3+w1bLadZwbJarCAqObnBeItAi4XPLX7vcD1aVG9n7PLgdyXv1K5Z0zDHY/d8 yDKk7MtGDV1Hl1vHxan267TDSW6/3SOeQBRpUPZVUZyIbB3JKV6BvZXtV3YQGdgCZ1tX gmsA== X-Gm-Message-State: ALoCoQnuMo0phgiOIE5V038BC5haFwPhVW9MVctKEnbtPduBskgdXrP/wHzmC7+V21uVII00YYcV X-Received: by 10.140.30.101 with SMTP id c92mr17270625qgc.74.1447351856881; Thu, 12 Nov 2015 10:10:56 -0800 (PST) Received: from ip-192-168-152-149.ec2.internal (static-100-38-5-130.nycmny.fios.verizon.net. [100.38.5.130]) by smtp.gmail.com with ESMTPSA id r67sm4356371qki.17.2015.11.12.10.10.55 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 12 Nov 2015 10:10:56 -0800 (PST) Content-Type: multipart/signed; boundary="Apple-Mail=_E17D2E9C-6691-4BDE-8210-BAB17B4971B7"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Problems upgrading Job X-Pgp-Agent: GPGMail 2.5.1 From: Rick Mangi In-Reply-To: Date: Thu, 12 Nov 2015 13:10:54 -0500 Message-Id: <86A466C7-8C90-431E-9A5D-92E223C9174F@chartbeat.com> References: <56DD6D62-1F49-49FD-8BF7-1965B98BDB79@chartbeat.com> To: dev@samza.apache.org X-Mailer: Apple Mail (2.2104) --Apple-Mail=_E17D2E9C-6691-4BDE-8210-BAB17B4971B7 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Yi, I pulled from master and built this morning. Yes, that=E2=80=99s the output from JobRunner. I also tried setting a = job.id to see if this was an issue migrating from an old task checkpoint = topic but I got the same result. Would you like me to open a jira ticket? Thanks, Rick > On Nov 12, 2015, at 12:59 PM, Yi Pan wrote: >=20 > Hi, Rick, >=20 > Did you get the fix in SAMZA-723 in your test? And could you confirm = that > the errors are from JobRunner log? >=20 > -Yi >=20 > On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi = wrote: >=20 >> Hi, >>=20 >> I=E2=80=99m trying to migrate our samza jobs to 0.10.0 snapshot = (built against the >> latest). Everything works fine running locally (although I had to = make some >> changes to the local grid=E2=80=99s kafka since the checkpointing = seems to require >> replication_factor > 1) but when I deploy it against my production = yarn >> cluster I get these errors. >>=20 >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper = state >> changed (SyncConnected) >> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] = Terminate >> ZkClient event thread. >> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: >> 0x250233cdf57f2fa closed >> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread = shut >> down >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] >> Coordinator stream __samza_coordinator_metrics-reporter_1 already = exists. >> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing = config in >> coordinator stream. >> [yarnmaster01] out: 2015-11-12 10:40:53 = CoordinatorStreamSystemProducer >> [INFO] Starting coordinator stream producer. >> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] >> Creating a new producer for system mykafka. >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] >> ProducerConfig values: >> [yarnmaster01] out: value.serializer =3D class >> org.apache.kafka.common.serialization.ByteArraySerializer >> [yarnmaster01] out: key.serializer =3D class >> org.apache.kafka.common.serialization.ByteArraySerializer >> [yarnmaster01] out: block.on.buffer.full =3D true >> [yarnmaster01] out: retry.backoff.ms =3D 100 >> [yarnmaster01] out: buffer.memory =3D 33554432 >> [yarnmaster01] out: batch.size =3D 16384 >> [yarnmaster01] out: metrics.sample.window.ms =3D 30000 >> [yarnmaster01] out: metadata.max.age.ms =3D 300000 >> [yarnmaster01] out: receive.buffer.bytes =3D 32768 >> [yarnmaster01] out: timeout.ms =3D 30000 >> [yarnmaster01] out: max.in.flight.requests.per.connection =3D 1 >> [yarnmaster01] out: bootstrap.servers =3D [ >> devstream01.chartbeat.net:9092] >> [yarnmaster01] out: metric.reporters =3D [] >> [yarnmaster01] out: client.id =3D >> samza_producer-metrics_reporter-1-1447342853273-4 >> [yarnmaster01] out: compression.type =3D none >> [yarnmaster01] out: retries =3D 2147483647 >> [yarnmaster01] out: max.request.size =3D 1048576 >> [yarnmaster01] out: send.buffer.bytes =3D 131072 >> [yarnmaster01] out: acks =3D 1 >> [yarnmaster01] out: reconnect.backoff.ms =3D 10 >> [yarnmaster01] out: linger.ms =3D 0 >> [yarnmaster01] out: metrics.num.samples =3D 2 >> [yarnmaster01] out: metadata.fetch.timeout.ms =3D 60000 >> [yarnmaster01] out: >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >> configuration batch.num.messages =3D null was supplied but isn't a = known >> config. >> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >> configuration producer.type =3D null was supplied but isn't a known = config. >> [yarnmaster01] out: Exception in thread "main" >> org.apache.samza.SamzaException: >> org.apache.kafka.common.errors.TimeoutException: Failed to update = metadata >> after 60000 ms. >> [yarnmaster01] out: at >> = org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(C= oordinatorStreamSystemProducer.java:115) >> [yarnmaster01] out: at >> = org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeC= onfig(CoordinatorStreamSystemProducer.java:132) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner.run(JobRunner.scala:85) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) >> [yarnmaster01] out: at >> org.apache.samza.job.JobRunner.main(JobRunner.scala) >> [yarnmaster01] out: Caused by: >> org.apache.kafka.common.errors.TimeoutException: Failed to update = metadata >> after 60000 ms. >> [yarnmaster01] out: >>=20 >>=20 >> Warning: run() received nonzero return code 1 while executing >> './bin/run-job.sh >> = -config-factory=3Dorg.apache.samza.config.factories.PropertiesConfigFactor= y >> --config-path=3Dfile://$PWD/conf/metrics_reporter.properties'! >>=20 >>=20 >> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 = but >> I=E2=80=99m not using a StreamAppender in log4j. >>=20 >> Any ideas? My first thought is that I might have to delete the = existing >> checkpoint topics but that would mean we can=E2=80=99t migrate = completely until the >> 10.0 release unless we want to run snapshot code in production. >>=20 >> Thanks! >>=20 >> Rick >>=20 >>=20 >>=20 --Apple-Mail=_E17D2E9C-6691-4BDE-8210-BAB17B4971B7 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- iQEcBAEBCAAGBQJWRNYuAAoJENgofZAtxzWQ1Y0IAIzfieEU/EiTBhPssE7YDh7A gvQB/FeVETgTPnOsb5z+dBlXAELj09qAyseBqNnIyT/VMqWS7fXZbUeXdg/F5hGT ed5UeLUC7k2SvKW9gXYU5c/n35+qXDXTa0s0lIWhgF5mAj1IXyG7L1I/exjqqTKs wYxog3SkJpezBIUoOBlINnn9dI+hyCVjG3jErU1c66l9h6lqLzoYAy3htpxmZvNx 0rA8XCsEbIpCa6K6DmvXmtyjjGQqa8LbfJTmRiHIkaYjM/GZQAUF+/+QzMbb8YZ1 lxaxmCu6EJxEStSimuM7BBymISEijbNo/fWKnOhWeHdHagJTOxrkNXTDLagkbVk= =8z4g -----END PGP SIGNATURE----- --Apple-Mail=_E17D2E9C-6691-4BDE-8210-BAB17B4971B7--