Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 84CFC200D1F for ; Fri, 29 Sep 2017 05:38:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 834901609EC; Fri, 29 Sep 2017 03:38:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7BAF41609CD for ; Fri, 29 Sep 2017 05:38:13 +0200 (CEST) Received: (qmail 69501 invoked by uid 500); 29 Sep 2017 03:38:12 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 69491 invoked by uid 99); 29 Sep 2017 03:38:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Sep 2017 03:38:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B1C84C4B73 for ; Fri, 29 Sep 2017 03:38:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.82 X-Spam-Level: **** X-Spam-Status: No, score=4.82 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, HTML_OBFUSCATE_20_30=2.441, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id s-D-TfyY1a82 for ; Fri, 29 Sep 2017 03:38:09 +0000 (UTC) Received: from mail-qt0-f176.google.com (mail-qt0-f176.google.com [209.85.216.176]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 376E35F29A for ; Fri, 29 Sep 2017 03:38:09 +0000 (UTC) Received: by mail-qt0-f176.google.com with SMTP id f15so122444qtf.7 for ; Thu, 28 Sep 2017 20:38:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=RR+D6dW5Sk8QIixchDLPRGFcX443dUitKxGuffMCLhk=; b=EG5xKcv7j86cmLUS+ZJfDbJtmTsNMKbJo9oebNw51IDxt+oXDfuTnQM2Xfe367UXJE RjO9Z4cz8RDDgSNwEz+ewpo967w+jSpNJF9jmk15Zu1ivamVJZ8agD0nc7FnhBsKrOn5 U6kj+yx/urw9e0PjmnWk3PBMAxApFKldwQzowDruFmqEpaMF26jI6lWU2J4np1KLVebX r5Xb0NQFNYP10S2CMaJg3LguTA0oXF/dDLEr9QcITGKAXYgFU1cwYRnXuvh8YsaSE5ys fUiy2cbahv3eXRoDNlaeFLaJ1syJEZ+NmA5UVN85nP2ICzbgG+vC5pon8KlKS3N5cQFe 2/5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=RR+D6dW5Sk8QIixchDLPRGFcX443dUitKxGuffMCLhk=; b=lAyqqwTe8yIBg1gE3BCBHIxLmMvMf55kR65pNJm67q7kREyR6pRpYUUy7pv3qPYGdF 1LmTac8WecmtS9DGbMuw43OPuAnEKSC4LPv9y8aPTOOODln+xN7/ztZWuRxE6grfaqxl R7p6PptBDxhDiXy57b8iimgI6V/CBfJ6153dYhY++Tumfdi9xTpYo54WgrjJOLgVUUdj LF88CZBxmSRPmHnAOW7h2JLksNOqEUyJSkShGr7hyQ1a6XFpPfUYzULl/IgHpnz20o+B nwrZVOUKRYieGeRbD2W+7z1ECJHmjvGtKJyp8fimi7a7GbM7pIsjrgopenNvJJECrEzH TzHQ== X-Gm-Message-State: AMCzsaUuXtA9UcIajPwA2PJRNdbuD4BE4nywGYi+INY7lhYuJPRMS5EO sp6vvCkkPjj32B8ChNXJ+kO1x79yA2YqqJLrKxoIpw== X-Google-Smtp-Source: AOwi7QCpB0gYA9TrGunUsYOuLAN/1SQzJrqLZLPQOzOBIQPNMdqOoUoZ67YC2qnlczQTzxY7uhZBzFUAzmOtZV86yhE= X-Received: by 10.200.56.184 with SMTP id f53mr3968165qtc.139.1506656283548; Thu, 28 Sep 2017 20:38:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.52.143 with HTTP; Thu, 28 Sep 2017 20:37:43 -0700 (PDT) In-Reply-To: References: From: wenxing zheng Date: Fri, 29 Sep 2017 11:37:43 +0800 Message-ID: Subject: Re: Failure in committing offset due to group rebalance To: user@flume.apache.org Content-Type: multipart/alternative; boundary="001a114368b6946174055a4bc0c0" archived-at: Fri, 29 Sep 2017 03:38:14 -0000 --001a114368b6946174055a4bc0c0 Content-Type: text/plain; charset="UTF-8" Thanks to Ferenc. We have do various adjustment on those settings. And we found that the case was due to Saturation of network bandwidth, and no matter what we set, it will get timeout. But the problem is after the network restored, Flume will not continue to work. On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo wrote: > Dear Wenxing, > > If I guess correctly you have time periods with very few messages and that > is when the issue happen. > If that is the case: > try to increase > kafka.consumer.heartbeat.interval.ms > and > kafka.consumer.session.timeout.ms > (session.timeout have to be more than the heartbeat interval) > > or lower the > kafka.consumer.max.partition.fetch.bytes to a little bit more than the > max size of 1 event. > > basically you can tweak kafka settings with > .kafka.consumer.* > and > .kafka.producer.* > > any setting you find here: http://kafka.apache.org/090/documentation.html > can be set with this method. > > Let us know if that helped or if some other config modification solved the > issue. > > Best Regards, > Ferenc Szabo > > On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng > wrote: > >> Dear all, >> >> We are running Flume v1.7.0 with Http Source and HDFS sink in pair, which >> are making use of the Kafka as the channel. And we often see the Exception >> in the HDFSEventSink with the following exception: >> >> 28 Sep 2017 11:52:14,683 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordin >>> ator$OffsetCommitResponseHandler.handle:550) - Error >>> ILLEGAL_GENERATION occurred while committing offsets for group >>> csdn.flume.http.kafka.hdfs >>> 28 Sep 2017 11:52:14,684 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> (org.apache.flume.sink.hdfs.HDFSEventSink.process:447) - process failed >>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot >>> be completed due to group rebalance >>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordina >>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552) >>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordina >>> tor$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493) >>> at org.apache.kafka.clients.consumer.internals.AbstractCoordina >>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665) >>> at org.apache.kafka.clients.consumer.internals.AbstractCoordina >>> tor$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644) >>> at org.apache.kafka.clients.consumer.internals.RequestFuture$1. >>> onSuccess(RequestFuture.java:167) >>> at org.apache.kafka.clients.consumer.internals.RequestFuture. >>> fireSuccess(RequestFuture.java:133) >>> at org.apache.kafka.clients.consumer.internals.RequestFuture. >>> complete(RequestFuture.java:107) >>> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC >>> lient$RequestFutureCompletionHandler.onComplete(ConsumerNetw >>> orkClient.java:380) >>> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient. >>> java:274) >>> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC >>> lient.clientPoll(ConsumerNetworkClient.java:320) >>> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC >>> lient.poll(ConsumerNetworkClient.java:213) >>> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC >>> lient.poll(ConsumerNetworkClient.java:193) >>> at org.apache.kafka.clients.consumer.internals.ConsumerNetworkC >>> lient.poll(ConsumerNetworkClient.java:163) >>> at org.apache.kafka.clients.consumer.internals.ConsumerCoordina >>> tor.commitOffsetsSync(ConsumerCoordinator.java:358) >>> at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync( >>> KafkaConsumer.java:968) >>> at org.apache.flume.channel.kafka.KafkaChannel$ConsumerAndRecor >>> ds.commitOffsets(KafkaChannel.java:684) >>> at org.apache.flume.channel.kafka.KafkaChannel$KafkaTransaction >>> .doCommit(KafkaChannel.java:567) >>> at org.apache.flume.channel.BasicTransactionSemantics.commit(Ba >>> sicTransactionSemantics.java:151) >>> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi >>> nk.java:433) >>> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi >>> nkProcessor.java:67) >>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav >>> a:145) >>> at java.lang.Thread.run(Thread.java:745) >>> 28 Sep 2017 11:52:14,716 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] >>> (org.apache.flume.SinkRunner$PollingRunner.run:158) - Unable to >>> deliver event. Exception follows. >>> org.apache.flume.EventDeliveryException: org.apache.kafka.clients.consumer.CommitFailedException: >>> Commit cannot be completed due to group rebalance >>> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSi >>> nk.java:451) >>> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSi >>> nkProcessor.java:67) >>> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.jav >>> a:145) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: org.apache.kafka.clients.consumer.CommitFailedException: >>> Commit cannot be completed due to group rebalance >> >> >> Is the problem related with the JIRA ticket: >> https://issues.apache.org/jira/browse/KAFKA-3409 and we need to upgrade >> the Kafka library to 0.10.0.0? >> >> Appreciated for any advice. >> Kind Regards, Wenxing >> > > --001a114368b6946174055a4bc0c0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks to Ferenc.

We have do various ad= justment on those settings. And we found that the case was due to Saturatio= n of network bandwidth, and no matter what we set, it will get timeout.
But the problem is after the network restored, Flume will not contin= ue to work.

On Thu, Sep 28, 2017 at 8:40 PM, Ferenc Szabo <<= a href=3D"mailto:fszabo@cloudera.com" target=3D"_blank">fszabo@cloudera.com= > wrote:
<= div>Dear Wenxing,

If I guess correctly you have ti= me periods with very few messages and that is when the issue happen.
<= div>If that is the case:
try to increase=C2=A0
and=C2=A0
(session.timeout have to be more = than the heartbeat interval)

or lower the
kafka.consumer.max.partition.fetch.bytes to a little bit more than t= he max size of 1 event.

basically you can tweak ka= fka settings with
<channel>.kafka.consumer.*
and= =C2=A0
<channel>.kafka.producer.*

can be set with this method.

L= et us know if that helped or if some other config modification solved the i= ssue.

Best Regards,
Ferenc Szabo

On Thu, Sep 28, 2017 at 8:20 AM, wenxing zheng <wenxing.zheng@gmail.com> wrote:
Dear all,

We are running Flume v1= .7.0 with Http Source and HDFS sink in pair, which are making use of the Ka= fka as the channel. And we often see the Exception in the HDFSEventSink wit= h the following exception:

28 Sep 2017 11:52:14,683 ERROR [SinkRunner-Polli= ngRunner-DefaultSinkProcessor] (org.apache.kafka.clients.consumer= .internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle= :550) =C2=A0- Error ILLEGAL_GENERATION occurred while committing offsets fo= r group csdn.flume.http.kafka.hdfs
28 Sep 2017 11:52:14,684 ERROR [SinkR= unner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.= HDFSEventSink.process:447) =C2=A0- process failed
org.apache.kafka.= clients.consumer.CommitFailedException: Commit cannot be completed due= to group rebalance
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clie= nts.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHa= ndler.handle(ConsumerCoordinator.java:552)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at org.apache.kafka.clients.consumer.internals.ConsumerCoor= dinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.j<= wbr>ava:493)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.con= sumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.ja= va:644)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.i= nternals.RequestFuture.fireSuccess(RequestFuture.java:133)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.internal= s.RequestFuture.complete(RequestFuture.java:107)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.internals.Consumer= NetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerN= etworkClient.java:380)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.ka= fka.clients.NetworkClient.poll(NetworkClient.java:274)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.internals.Co= nsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.inte= rnals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.in= ternals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193= )
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consumer.= internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:1= 63)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.consume= r.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordin= ator.java:358)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.kafka.clients.c= onsumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flume.channel.kafka.KafkaChanne= l$ConsumerAndRecords.commitOffsets(KafkaChannel.java:684)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flume.channel.kafka.KafkaChanne= l$KafkaTransaction.doCommit(KafkaChannel.java:567)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 at org.apache.flume.channel.BasicTransactionSemantic= s.commit(BasicTransactionSemantics.java:151)
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEve= ntSink.java:433)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flume.si= nk.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flume.SinkRunner$PollingRunn= er.run(SinkRunner.java:145)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lan= g.Thread.run(Thread.java:745)
28 Sep 2017 11:52:14,716 ERROR [SinkR= unner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner= $PollingRunner.run:158) =C2=A0- Unable to deliver event. Exception fol= lows.
org.apache.flume.EventDeliveryException: org.apache.kafka.cli= ents.consumer.CommitFailedException: Commit cannot be completed due to= group rebalance
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flume.sink.hd= fs.HDFSEventSink.process(HDFSEventSink.java:451)
=C2=A0 =C2=A0= =C2=A0 =C2=A0 at org.apache.flume.sink.DefaultSinkProcessor.process(D= efaultSinkProcessor.java:67)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apa= che.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 at java.lang.Thread.run(Thread.java:745)Caused by: org.apache.kafka.clients.consumer.CommitFailedException: = Commit cannot be completed due to group rebalance
Is the problem related with the JIRA ticket: https://issue= s.apache.org/jira/browse/KAFKA-3409 and we need to upgrade the Kaf= ka library to 0.10.0.0?

Appreciated for any advice= .
Kind Regards, Wenxing


--001a114368b6946174055a4bc0c0--