From commits-return-55630-archive-asf-public=cust-asf.ponee.io@beam.apache.org Thu Jan 11 19:47:05 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 44FCC180656 for ; Thu, 11 Jan 2018 19:47:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 35026160C13; Thu, 11 Jan 2018 18:47:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 56308160C23 for ; Thu, 11 Jan 2018 19:47:04 +0100 (CET) Received: (qmail 14788 invoked by uid 500); 11 Jan 2018 18:47:03 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 14779 invoked by uid 99); 11 Jan 2018 18:47:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jan 2018 18:47:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 14E111A0EE4 for ; Thu, 11 Jan 2018 18:47:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.911 X-Spam-Level: X-Spam-Status: No, score=-99.911 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Lrs2tgrJLDvx for ; Thu, 11 Jan 2018 18:47:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 3566C5F3CD for ; Thu, 11 Jan 2018 18:47:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B1C5DE2578 for ; Thu, 11 Jan 2018 18:47:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5A947255CC for ; Thu, 11 Jan 2018 18:47:00 +0000 (UTC) Date: Thu, 11 Jan 2018 18:47:00 +0000 (UTC) From: "Robert Burke (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (BEAM-3458) Go SDK beam.Create & beam.CreateList should support complex types MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/BEAM-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke updated BEAM-3458: ------------------------------- Description: beam.Create and beam.CreateList when used with complex types do not survive pipeline serialization and deserliazation such as when the values are being decoded on a remote runner. Such an ability is useful for providing static data, or known at construction time data to the pipeline. The following works as expected in the direct go runner, which doesn't serialize and deserialize the pipeline, but fails remotely. The pipeline typechecks correctly. {code:java} type wordCount struct { K string V int } func splitToKV(e wordCount) (string,int) { return e.K, e.V } p := beam.NewPipeline() s := p.Root() list := beam.CreateList(s, []wordCount{{"a", 23},{"b", 42},{"c", 5}}} kvs := beam.ParDo(s, splitToKV, list) {code} ... rest of pipeline... The pipeline will try to execute the splitToKV pardo, and will panic when trying to use the JSON decoded values. Specifically, the beam.Create generated createFn only has a field of []interface, which when used with the JSON unmarshaller, will use map[string]interface instead for each value (as per the godoc for encoding/json). The reflect library will then panic when trying to conver these map[string]interface values to wordCount structs for the splitToKV function. This sort of thing will occur whenever a structural DoFn uses interface{} types to persist values to runners, since the underlying type information is lost in the encoding done by serialize.go However, the types are known at construction time, either directly, or by the type checker when using Universal types, so the true underlying type could be encoded, and then used in the decoding process before storing them in the dematerialized structural DoFn. A user can currently work around this by manually JSON encoding their structs to strings, and manually decoding them in their pipeline, but would need specialized code for each type used this way. was: beam.Create and beam.CreateList when used with complex types do not survive pipeline serialization and deserliazation such as when the values are being decoded on a remote runner. Such an ability is useful for providing static data, or known at construction time data to the pipeline. The following works as expected in the direct go runner, which doesn't serialize and deserialize the pipeline, but fails remotely. The pipeline typechecks correctly. {code:go} type wordCount struct { K string V int } func splitToKV(e wordCount) (string,int) { return e.K, e.V } p := beam.NewPipeline() s := p.Root() list := beam.CreateList(s, []wordCount{{"a", 23},{"b", 42},{"c", 5}}} kvs := beam.ParDo(s, splitToKV, list) {code} ... rest of pipeline... The pipeline will try to execute the splitToKV pardo, and will panic when trying to use the JSON decoded values. Specifically, the beam.Create generated createFn only has a field of []interface, which when used with the JSON unmarshaller, will use map[string]interface instead for each value (as per the godoc for encoding/json). The reflect library will then panic when trying to conver these map[string]interface values to wordCount structs for the splitToKV function. This sort of thing will occur whenever a structural DoFn uses interface{} types to persist values to runners, since the underlying type information is lost in the encoding done by serialize.go However, the types are known at construction time, either directly, or by the type checker when using Universal types, so the true underlying type could be encoded, and then used in the decoding process before storing them in the dematerialized structural DoFn. A user can currently work around this by manually JSON encoding their structs to strings, and manually decoding them in their pipeline, but would need specialized code for each type used this way. > Go SDK beam.Create & beam.CreateList should support complex types > ----------------------------------------------------------------- > > Key: BEAM-3458 > URL: https://issues.apache.org/jira/browse/BEAM-3458 > Project: Beam > Issue Type: Bug > Components: sdk-go > Affects Versions: Not applicable > Reporter: Robert Burke > Assignee: Henning Rohde > > beam.Create and beam.CreateList when used with complex types do not survive pipeline serialization and deserliazation such as when the values are being decoded on a remote runner. > Such an ability is useful for providing static data, or known at construction time data to the pipeline. > The following works as expected in the direct go runner, which doesn't serialize and deserialize the pipeline, but fails remotely. The pipeline typechecks correctly. > {code:java} > type wordCount struct { > K string > V int > } > func splitToKV(e wordCount) (string,int) { > return e.K, e.V > } > p := beam.NewPipeline() > s := p.Root() > list := beam.CreateList(s, []wordCount{{"a", 23},{"b", 42},{"c", 5}}} > kvs := beam.ParDo(s, splitToKV, list) > {code} > ... rest of pipeline... > The pipeline will try to execute the splitToKV pardo, and will panic when trying to use the JSON decoded values. Specifically, the beam.Create generated createFn only has a field of []interface, which when used with the JSON unmarshaller, will use map[string]interface instead for each value (as per the godoc for encoding/json). > The reflect library will then panic when trying to conver these map[string]interface values to wordCount structs for the splitToKV function. > This sort of thing will occur whenever a structural DoFn uses interface{} types to persist values to runners, since the underlying type information is lost in the encoding done by serialize.go > However, the types are known at construction time, either directly, or by the type checker when using Universal types, so the true underlying type could be encoded, and then used in the decoding process before storing them in the dematerialized structural DoFn. > A user can currently work around this by manually JSON encoding their structs to strings, and manually decoding them in their pipeline, but would need specialized code for each type used this way. -- This message was sent by Atlassian JIRA (v6.4.14#64029)