From user-return-19444-archive-asf-public=cust-asf.ponee.io@flink.apache.org Wed Apr 18 15:10:38 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6301618064E for ; Wed, 18 Apr 2018 15:10:37 +0200 (CEST) Received: (qmail 86147 invoked by uid 500); 18 Apr 2018 13:10:36 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 86137 invoked by uid 99); 18 Apr 2018 13:10:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Apr 2018 13:10:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 924D3C0224 for ; Wed, 18 Apr 2018 13:10:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.888 X-Spam-Level: ** X-Spam-Status: No, score=2.888 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id F4FsRH0FgAPA for ; Wed, 18 Apr 2018 13:10:34 +0000 (UTC) Received: from mail-it0-f44.google.com (mail-it0-f44.google.com [209.85.214.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8F5475F5DF for ; Wed, 18 Apr 2018 13:10:34 +0000 (UTC) Received: by mail-it0-f44.google.com with SMTP id 71-v6so2368464ith.2 for ; Wed, 18 Apr 2018 06:10:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9M4yr15W6wzMW1WvIRFpV4YSKFd7H27PVSt9UXGmsIg=; b=p93qu8qOaubGDZu3e/rRfVbvKGgxW++WKrsGkKsI/fWvNxp5hFXfk5T0icLnDUy4xg v0Qj2pzZInhwAoC9iaFIOit9g2obVWK68XO8scx3GmL33EyRCo13gQBZzpi6xj2evgfR MNETkW6vb8cp4RduAfOwZCL3Ndr6/n8SoeVP8s6gKJZ7O9uZ1v+hFl7ATpYs6knyYx2t 75+HqOpgTCetxoSnbU9QxhplRk63rNzge21rA/TH89j2CCfZUW+dXyZnKP7Gogx9NZS1 eksDOkDxYTEDQKMZlkyldr7lAN5QoTTuJOcw9Mnq5bpDKkwJ5SPWdrpvKAGaotXAknh4 ctmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9M4yr15W6wzMW1WvIRFpV4YSKFd7H27PVSt9UXGmsIg=; b=PpPRDwI6IVYXeJ5GislYR57tIUKAdFOI9wx1QyIDTzOSb4HRFoGJ8qAR8J+f8ijQg/ +b+j9bS3bCgo0VOxhwdwTW7wQ1RBz1DJAHpoV2h9inf6ux50oYjTWWJ9QLV3KtWVWxtZ p6QXPkWX18bhhIUzYhNdaM+ztZb9TcCWZ1K3Wfxed5zm30B8JU/rIKSrG2R7xEFStXMy /P9rmHn1PD8jTbKsSP+cWI7YfNUWdOEpiDdhF5NRwvJjOQr9ReboMNNJphW1BGsT2mcs gLO7aZPuOckR7LjXvNdJY6QoYBHCtoXfaJ33PzLJJuj98SFNUB/gsbXvPUiyVgseB142 wAAg== X-Gm-Message-State: ALQs6tDsKTpPCEZ//VPBE/9wDE/OlqQyuMGy+42uDv/F7XzdWZ2F+bPQ nuqprUxoggoFc8CimeDpNabB4RbwrV9R2GFvj0c= X-Google-Smtp-Source: AIpwx4+gmkVbaw2c0j5Q2LxvsP/JaRwerC9usGQLN/4hxQLWDa0ShR96GLuDdhdcwT/kZZgeH8gJOeehV/BOLyEbRJQ= X-Received: by 2002:a24:7088:: with SMTP id f130-v6mr1475794itc.39.1524057033903; Wed, 18 Apr 2018 06:10:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexander Smirnov Date: Wed, 18 Apr 2018 13:10:23 +0000 Message-ID: Subject: Re: Tracking deserialization errors To: "Tzu-Li (Gordon) Tai" Cc: Fabian Hueske , user , Elias Levy Content-Type: multipart/alternative; boundary="0000000000001f92c7056a1f2ec0" --0000000000001f92c7056a1f2ec0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ouch, i forgot to mention I opened https://issues.apache.org/jira/browse/FLINK-9155 to track this. Should it be a duplicate of 9204 then? On Wed, Apr 18, 2018 at 3:32 PM Tzu-Li (Gordon) Tai wrote: > Hi, > > These are valid concerns. And yes, AFAIK users have been writing to logs > within the deserialization schema to track this. The connectors as of now > have no logging themselves in case of a skipped record. > > I think we can implement both logging and metrics to track this, most of > which you have already brought up. > For logging, the information should contain topic, partition, and offset > for debugging. > For metrics, we should be able to use the user variable functionality to > have skip counters that can be grouped by topic / partition / offset. > > Though, I=E2=80=99m not sure how helpful this would be in practice. > I=E2=80=99ve opened a JIRA for this issue for further discussion: > https://issues.apache.org/jira/browse/FLINK-9204 > > Cheers, > Gordon > > On 16 April 2018 at 7:43:00 PM, Fabian Hueske (fhueske@gmail.com) wrote: > > Thanks for starting the discussion Elias. > > I see two ways to address this issue. > > 1) Add an interface that a deserialization schema can implement to > register metrics. Each source would need to check for the interface and > call it to setup metrics. > 2) Check for null returns in the source functions and increment a > respective counter. > > In both cases, we need to touch the source connectors. > > I see that passing information such as topic name, partition, and offset > are important debugging information. However, I don't think that metrics > would be good to capture them. > In that case, log files might be a better approach. > > I'm not sure to what extend the source functions (Kafka, Kinesis) support > such error tracking. > Adding Gordon to the thread who knows the internals of the connectors. > > Best, Fabian > > 2018-04-08 17:53 GMT+02:00 Alexander Smirnov >: > >> I have the same question. In case of kafka source, it would be good to >> know topic name and offset of the corrupted message for further >> investigation. >> Looks like the only option is to write messages into a log file >> >> On Fri, Apr 6, 2018 at 9:12 PM Elias Levy >> wrote: >> >>> I was wondering how are folks tracking deserialization errors. >>> The AbstractDeserializationSchema interface provides no mechanism for t= he >>> deserializer to instantiate a metric counter, and "deserialize" must re= turn >>> a null instead of raising an exception in case of error if you want you= r >>> job to continue functioning during a deserialization error. But that m= eans >>> such errors are invisible. >>> >>> Thoughts? >>> >> > --0000000000001f92c7056a1f2ec0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
ouch, i forgot to mention I opened=C2=A0https://issues.apache.org/jira/b= rowse/FLINK-9155=C2=A0to track this. Should it be a duplicate of 9204 t= hen?

On Wed, Apr 18, 2= 018 at 3:32 PM Tzu-Li (Gordon) Tai <tzulitai@apache.org> wrote:
Hi,

These ar= e valid concerns. And yes, AFAIK users have been writing to logs within the= deserialization schema to track this. The connectors as of now have no log= ging themselves in case of a skipped record.

I think w= e can implement both logging and metrics to track this, most of which you h= ave already brought up.
For logging, the information should contain= topic, partition, and offset for debugging.
For metrics, we should= be able to use the user variable functionality to have skip counters that = can be grouped by topic / partition / offset.

Though, I=E2=80=99m not sure how helpful this would be in prac= tice.
I=E2=80=99ve opened a JIRA for th= is issue for further discussion:=C2=A0https://issues.apache.org/jira/br= owse/FLINK-9204

Cheers,
Gordon

On 16 April 2018 at 7:43:00 PM, Fabian Hueske (fhueske@gmail.com) wrote:

<= div>
Thanks for starting the discussion Elias.

I see two ways to address this issue.

1) Add an interface that a deserialization schema can implement to register metrics. Each source would need to check for the interface and call it to setup metrics.
2) Check for null returns in the source functions and increment a respective counter.

In both cases, we need to touch the source connectors.

I see that passing information such as topic name, partition, and offset are important debugging information. However, I don't think that metrics would be good to capture them.
In that case, log files might be a better approach.

I'm not sure to what extend the source functions (Kafka, Kinesis) support such error tracking.
Adding Gordon to the thread who knows the internals of the connectors.

Best, Fabian

2018-04-08 17:53 GMT+02:00 Alexander Smirnov <alexander.smirnoff@gmail.com>:
I have the same question. In case of kafka source, it would be good to know topic name and offset of the corrupted message for further investigation.
Looks like the only option is to write messages into a log file

On Fri, Apr 6, 2018 at 9:12 PM Elias Levy <fearso= me.lucidity@gmail.com> wrote:
I was wondering how are folks tracking deserialization errors.=C2=A0 The=C2=A0AbstractDeserializationSchema interface provides no mechanism for the deserializer to instantiate a metric counter, and "deserialize" must return a null instead of raising an exception = in case of error if you want your job to continue functioning during a deserialization error.=C2=A0 But that means such errors are invisible.

Thoughts?

--0000000000001f92c7056a1f2ec0--