Return-Path: X-Original-To: apmail-incubator-kafka-users-archive@minotaur.apache.org Delivered-To: apmail-incubator-kafka-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9C65D779D for ; Wed, 26 Oct 2011 07:29:14 +0000 (UTC) Received: (qmail 98002 invoked by uid 500); 26 Oct 2011 07:29:14 -0000 Delivered-To: apmail-incubator-kafka-users-archive@incubator.apache.org Received: (qmail 97976 invoked by uid 500); 26 Oct 2011 07:29:14 -0000 Mailing-List: contact kafka-users-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: kafka-users@incubator.apache.org Delivered-To: mailing list kafka-users@incubator.apache.org Received: (qmail 97968 invoked by uid 99); 26 Oct 2011 07:29:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 07:29:13 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [212.227.126.187] (HELO moutng.kundenserver.de) (212.227.126.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 07:29:06 +0000 Received: from [172.19.253.164] ([212.87.32.10]) by mrelayeu.kundenserver.de (node=mrbap3) with ESMTP (Nemesis) id 0MVM1m-1RWjZI0Xra-00YlbV; Wed, 26 Oct 2011 09:28:46 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: corrupted topic From: Tim Lossen In-Reply-To: Date: Wed, 26 Oct 2011 09:28:45 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <61F214B3-6A0D-48BE-AFFC-367FD6E9BC25@lossen.de> References: <0D7DCA44-2FDB-467B-821B-466F28CE7218@lossen.de> <8E42288F-D38D-42DF-8395-DAB540E5F133@lossen.de> To: kafka-users@incubator.apache.org X-Mailer: Apple Mail (2.1084) X-Provags-ID: V02:K0:lq+xtSdg/W8cUJZ+CzJrkL10QSm8bUZMAN3K/FAIxcT eQ+5fBUVBKoD429KrWvDerR0JyodmTZUbkWdzdaE9Rt30T67Lv iMKoFB40MvMpmWpjWkMRlIgm5mM9y8LcyHBsCH9wx/Nk5ZxtWy nm/gWEMFa759O/8Jh2AfcItoukolwYmJyCdicJr1X4Dk8blHkq WAxK7qRmFgh9z/QfYv3YhbYSCahlEEW/zXhp/YtojdCobDCZE9 msGPepEZ+LPH7AJlnvKG1VXprMPgg3fwthMvhyL4Nrsuu4/kvz 34R8y+cF/iuk7vujESBl/saJFU4ryq9LuG4f2mw44f9lpwuE4w uspSp5o+erbRsv+XDQyGeRdQB5uPi8v+7juJSuL7C X-Virus-Checked: Checked by ClamAV on apache.org neha, >>> it looked as if some of them use the same logic as the ruby client, = and might also be affected. >=20 > Could you please list the clients that you think might have the same = bug ? well, the c#, go and php client seem all to be doing more or less the same thing as the ruby client, as far as i can tell ;) when consuming a batch of messages, the last thing in the=20 response buffer will often be an incomplete message. the logic=20 to detect and skip this was broken, which results in a subsequent parsing error. cheers tim On 2011-10-25, at 19:17 , Neha Narkhede wrote: > Tim, >=20 > Thanks for looking into this ! >=20 >>> it looked as if some of them use the same logic as the ruby client, = and might also be affected. >=20 > Could you please list the clients that you think might have the same = bug ? >=20 >> are you actually using any client besides the java / scala one in = production at linkedin? >=20 > No. >=20 > Thanks, > Neha >=20 > On Tue, Oct 25, 2011 at 9:16 AM, Tim Lossen wrote: >> jun, >>=20 >> the ruby client (maintained by alejandro crosa) is here: >>=20 >> https://github.com/acrosa/kafka-rb >>=20 >> i noticed just now that he also seems to work at linkedin? >>=20 >> last commit on github is a bugfix, on october 14. but there >> also seem to be some changes on the apache side which are not >> in the github version ... maybe alejandro can best sort this >> out himself. >>=20 >> while looking over the other clients when we where hunting for >> this bug, it looked as if some of them use the same logic as >> the ruby client, and might also be affected. >>=20 >> are you actually using any client besides the java / scala one >> in production at linkedin? >>=20 >> cheers >> tim >>=20 >>=20 >> On 2011-10-25, at 17:58 , Jun Rao wrote: >>=20 >>> Tim, >>>=20 >>> Thanks for the update. What's the github url for the ruby client? = Has it >>> diverged from what's in Apache? >>>=20 >>> I agree with you that we should consider excluding those clients not = well >>> maintained from our distribution. >>>=20 >>> Jun >>>=20 >>> On Tue, Oct 25, 2011 at 7:46 AM, Tim Lossen wrote: >>>=20 >>>> ok, we finally traced this issue to a bug in the ruby kafka client, >>>> which we were able to fix -- the topic was never corrupted. >>>>=20 >>>> (we sent a pull request to the maintainer of the client on github.) >>>>=20 >>>> BTW, i do not think that it is a good idea to include an (outdated) >>>> copy of the ruby client (and other clients) in the kafka = distribution. >>>> maybe better *link* to the actual client projects? >>>>=20 >>>> cheers >>>> tim >>>>=20 >>>> On 2011-10-24, at 21:30 , Tim Lossen wrote: >>>>=20 >>>>> ok, thanks -- tomorrow we'll try investigate further ... >>>>>=20 >>>>>=20 >>>>> On 2011-10-24, at 9:12 PM, Neha Narkhede wrote: >>>>>=20 >>>>>> Tim, >>>>>>=20 >>>>>>> what if the CRC32 checksum is correct, but the internal binary = message >>>> structure is not? >>>>>>=20 >>>>>> The CRC check involves recomputing the CRC and then checking = against >>>>>> the stored CRC in the header. The probability of that matching is >>>>>> extremely low. >>>>>>=20 >>>>>> Corruption is also possible if the broker crashes in the middle = of a >>>>>> flush. In that case, when the broker restarts, it detects an = unclean >>>>>> shutdown, runs recovery on the logs and truncates the log if the = CRC >>>>>> check fails at some message. >>>>>>=20 >>>>>> Also, we compute the CRC only on the payload of the message. So >>>>>> technically, some bits could get flipped in the header of the = message. >>>>>>=20 >>>>>> Thanks, >>>>>> Neha >>>>>>=20 >>>>>> On Mon, Oct 24, 2011 at 12:07 PM, Tim Lossen = wrote: >>>>>>> what if the CRC32 checksum is correct, but the internal binary = message >>>>>>> structure is not? >>>>>>>=20 >>>>>>>=20 >>>>>>> On 2011-10-24, at 8:56 PM, Jay Kreps wrote: >>>>>>>=20 >>>>>>>> It is not supposed to be possible. We include a CRC32 with each >>>> message, >>>>>>>> so >>>>>>>> invalid requests should be detected and rejected. But that does = not >>>>>>>> preclude >>>>>>>> the possibility that we missed a case. >>>>>>>>=20 >>>>>>>> -Jay >>>>>>>>=20 >>>>>>>> On Mon, Oct 24, 2011 at 11:41 AM, Tim Lossen = wrote: >>>>>>>>=20 >>>>>>>>> hi, >>>>>>>>>=20 >>>>>>>>> is it possible for a faulty client to "corrupt" a topic on the >>>> broker, >>>>>>>>> so that consumers cannot consume messages any more? or does >>>>>>>>> the broker protect itself against this? >>>>>>>>>=20 >>>>>>>>> i am asking because we seem to have run into such a situation. >>>>>>>>> we are using a perl producer and a ruby consumer. the per lib = might >>>>>>>>> be a bit outdated. >>>>>>>>>=20 >>>>>>>>> cheers >>>>>>>>> tim >>>>>>>>>=20 >>>>>>>>> -- >>>>>>>>> http://tim.lossen.de >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>=20 >>>>>>> -- >>>>>>> http://tim.lossen.de >>>>>>>=20 >>>>>>>=20 >>>>>=20 >>>>> -- >>>>> http://tim.lossen.de >>>>>=20 >>>>=20 >>>> -- >>>> http://tim.lossen.de >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>=20 >> -- >> http://tim.lossen.de >>=20 >>=20 >>=20 >>=20 -- http://tim.lossen.de