From user-return-23441-archive-asf-public=cust-asf.ponee.io@flink.apache.org Thu Oct 4 19:28:17 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1AEFC180677 for ; Thu, 4 Oct 2018 19:28:16 +0200 (CEST) Received: (qmail 60441 invoked by uid 500); 4 Oct 2018 17:28:15 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 60431 invoked by uid 99); 4 Oct 2018 17:28:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Oct 2018 17:28:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 265BDC284A for ; Thu, 4 Oct 2018 17:14:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.988 X-Spam-Level: * X-Spam-Status: No, score=1.988 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cleverdata-ru.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 7mVskPkOX41x for ; Thu, 4 Oct 2018 17:14:14 +0000 (UTC) Received: from mail-lf1-f67.google.com (mail-lf1-f67.google.com [209.85.167.67]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 42AC95F398 for ; Thu, 4 Oct 2018 17:14:14 +0000 (UTC) Received: by mail-lf1-f67.google.com with SMTP id m80-v6so7352592lfi.12 for ; Thu, 04 Oct 2018 10:14:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cleverdata-ru.20150623.gappssmtp.com; s=20150623; h=from:mime-version:subject:message-id:date:to; bh=lihzmpmB4InHiCLj6IWgVMRN/jc/F9PDmwkxZnYli1E=; b=b82Oagkht44wYqhAJrdma9mwbkJg/bkYwQbqbOceJZcRRbm8GZATJRylxUXJUdik2K QlzKdsLLcB6Df/T3k6sRSgqDooKgz8rwP+vhmWInct7y/amMbpRzDuktmDO/IBT3AOs4 RTja152KKcXluTxu8yvcpaOI+xTJMn3BINN25oqCEi2z6RZzKwbC2S846sqiQPi47JJf 5KXNBBMo/jrkBDgD/AzNsO8A4HGHKfP/Kz8dQqKI+uTFEwxi8zsFLyScRK9HP4GBTmhd Iotc/XeYNvl8WjdPca6rfMSrKlOhSpYmpEiKow11S2Hd5Hlr0bmJbFyqknucaQLo+7eu KdZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:message-id:date:to; bh=lihzmpmB4InHiCLj6IWgVMRN/jc/F9PDmwkxZnYli1E=; b=CS9LwDwxh4j3loArsNzRpaRWi2uyeMeNAUQRInyFcxfPhRI+4NcH3VdVLWLhQhSUa6 RZp3/5soyK3NAYkRM2BBMNN4Lr4fegv66jJllg/aEYzGGYGbt1OQ+z9Zb814ZwOhCVZD /2Jv/eRCNRfPb/2ZD8hzi+u7cQHhQF18a8mKW1qloO9SnbXAWYsG3ZJxiZHAjaWqfr/t rMZ16DkI3OA6h0HOPh94MJpWvSgA4Ltp/m66pCTGGY63Tpzt3HJuD7loo5daKG+EO5hb TWqlNG2VG0+ba/B/ZYH+yPW2ml1Zkz4jNEDs4t4VGrV62xCjoP4xcyVvc0vGqm/dNn9w KZtg== X-Gm-Message-State: ABuFfojVoyOyU3f6BQE7XNJfxKYg5UXPKFmiOfDqFvbPplgUf8Psdxn8 mqI76917HhdW3FYtCcpQ87LRzwarU2ajx81J X-Google-Smtp-Source: ACcGV60K7E15ZRHa6W/omPylME+7gJ3odUgd2LJKiBIr2bXwHTXI2835x/ftBrFZ5FO7w3UHjd46Dg== X-Received: by 2002:a19:8d11:: with SMTP id p17-v6mr4256656lfd.116.1538673252417; Thu, 04 Oct 2018 10:14:12 -0700 (PDT) Received: from [10.8.0.26] (mon.1dmp.io. [144.76.110.200]) by smtp.gmail.com with ESMTPSA id m132-v6sm1151175lfg.24.2018.10.04.10.14.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Oct 2018 10:14:11 -0700 (PDT) From: Rinat Content-Type: multipart/alternative; boundary="Apple-Mail=_F4CE51BE-452D-487B-B2D1-4A325E081158" Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: [deserialization schema] skip data, that couldn't be properly deserialized Message-Id: <59EF0541-1A9B-4408-87C5-FE8BD018EA17@cleverdata.ru> Date: Thu, 4 Oct 2018 20:14:09 +0300 To: user X-Mailer: Apple Mail (2.3273) --Apple-Mail=_F4CE51BE-452D-487B-B2D1-4A325E081158 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi mates, in accordance with the contract of = org.apache.flink.formats.avro.DeserializationSchema, it should return = null value, when content couldn=E2=80=99t be deserialized. But in most cases (for example = org.apache.flink.formats.avro.AvroDeserializationSchema) method fails if = data is corrupted.=20 We=E2=80=99ve implemented our own SerDe class, that returns null, if = data doesn=E2=80=99t satisfy avro schema, but it=E2=80=99s rather hard = to maintain this functionality during migration to the latest Flink = version. What do you think, maybe it=E2=80=99ll be useful if we will support = optional skip of failed records in avro and other Deserializers in the = source code ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.sharipov@cleverdata.ru mobile: +7 (925) 416-37-26 CleverDATA make your data clever --Apple-Mail=_F4CE51BE-452D-487B-B2D1-4A325E081158 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi mates, in accordance with the contract of org.apache.flink.formats.avro.DeserializationSchema, it should = return null value, when content = couldn=E2=80=99t be = deserialized.
But in most = cases (for example org.apache.flink.formats.avro.AvroDeserializationSchema) method = fails if data is corrupted. 

We=E2=80=99ve implemented our own SerDe = class, that returns null, if data doesn=E2=80=99t satisfy avro schema, = but it=E2=80=99s rather hard to maintain this functionality during = migration to the latest Flink version.
What do you = think, maybe it=E2=80=99ll be useful if we will support optional skip of = failed records in avro and other Deserializers in the source code = ?

Sincerely yours,
Rinat = Sharipov
Software Engineer at 1DMP CORE Team

mobile: +7 (925) 416-37-26

CleverDATA
make your data = clever

= --Apple-Mail=_F4CE51BE-452D-487B-B2D1-4A325E081158--