Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 29C0C200C46 for ; Wed, 29 Mar 2017 20:30:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 28111160B8A; Wed, 29 Mar 2017 18:30:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6CCA8160B5D for ; Wed, 29 Mar 2017 20:30:44 +0200 (CEST) Received: (qmail 13686 invoked by uid 500); 29 Mar 2017 18:30:43 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 13677 invoked by uid 99); 29 Mar 2017 18:30:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Mar 2017 18:30:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3EE40C0C22 for ; Wed, 29 Mar 2017 18:30:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id gZqehbkfkHqS for ; Wed, 29 Mar 2017 18:30:42 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 795745F613 for ; Wed, 29 Mar 2017 18:30:42 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0580CE04B5 for ; Wed, 29 Mar 2017 18:30:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id AB0EC2416A for ; Wed, 29 Mar 2017 18:30:41 +0000 (UTC) Date: Wed, 29 Mar 2017 18:30:41 +0000 (UTC) From: "Eugene Kirpichov (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (BEAM-73) IO design pattern: Decouple Parsers and Coders MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 29 Mar 2017 18:30:45 -0000 [ https://issues.apache.org/jira/browse/BEAM-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Kirpichov closed BEAM-73. -------------------------------- Resolution: Duplicate The only remaining instance of this is in KafkaIO, handled by BEAM-1573. > IO design pattern: Decouple Parsers and Coders > ---------------------------------------------- > > Key: BEAM-73 > URL: https://issues.apache.org/jira/browse/BEAM-73 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core > Reporter: Daniel Halperin > Priority: Minor > Labels: backward-incompatible > Fix For: First stable release > > > Many Sources can be thought of as providing a byte[] payload -- e.g. TextIO bytes between newlines, or PubSubIO messages. Therefore, we originally suggested a Coder as the thing to use to decode these byte[] into T (what I'll call Parsing). > Consider the case of a text file of integers. > 123\n > 456\n > ... > We want a PCollection out, so we can use TextualIntegerCoder with TextIO.Read. However, that Coder will get propagated as the default coder for that PCollection (and may be used in downstream DoFns). This seem bad as, once the data is parsed, we probably want to use VarIntCoder or another Coder that is more CPU- and Space-efficient. > Another design pattern is > TextIO.Read() -> MapElements (lambda s : Integer.parseInt(s)) > This has better behavior, but now we go from byte[] to String to Integer rather than directly from byte[] to Integer. > The solution seems to be to explicitly add Parser and Coder abstractions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)