Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E018A200CD6 for ; Mon, 31 Jul 2017 19:27:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DEDCC165994; Mon, 31 Jul 2017 17:27:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 30012165995 for ; Mon, 31 Jul 2017 19:27:05 +0200 (CEST) Received: (qmail 32657 invoked by uid 500); 31 Jul 2017 17:27:04 -0000 Mailing-List: contact commits-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list commits@beam.apache.org Received: (qmail 32648 invoked by uid 99); 31 Jul 2017 17:27:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Jul 2017 17:27:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CEE7D1A1FFD for ; Mon, 31 Jul 2017 17:27:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id R1MNDm7nLXtn for ; Mon, 31 Jul 2017 17:27:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 470475FB40 for ; Mon, 31 Jul 2017 17:27:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 952EDE0059 for ; Mon, 31 Jul 2017 17:27:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 514F12464A for ; Mon, 31 Jul 2017 17:27:00 +0000 (UTC) Date: Mon, 31 Jul 2017 17:27:00 +0000 (UTC) From: "Thomas Groh (JIRA)" To: commits@beam.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (BEAM-2699) AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 31 Jul 2017 17:27:06 -0000 [ https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107634#comment-16107634 ] Thomas Groh commented on BEAM-2699: ----------------------------------- The latter claim is incorrect: two applications of the same transform to the same input will produce different output (unless the transform's output is a subset of it's input; in which case the two applications are indistinguishable (same input, same transform, same output), and reusing the same references cannot be observed. Additionally, applications within a pipeline are forced to have unique names, so no two transforms will ever incorrectly collide. The former claim is more worrying; failing to find the same application is a pretty big problem. Potentially equals/hashCode should be (Name + Pipeline)? As an aside, PTransforms shouldn't throw an exception if you call equals or hashCode on them, just return a more-discriminating response. > AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable > ------------------------------------------------------------------------------------------------- > > Key: BEAM-2699 > URL: https://issues.apache.org/jira/browse/BEAM-2699 > Project: Beam > Issue Type: Bug > Components: runner-core > Reporter: Eugene Kirpichov > Assignee: Thomas Groh > > There's plenty of occurrences in runners-core of Map or BiMap where the key is an AppliedPTransform. > However, PTransform does not advertise that it is required to implement equals/hashCode, and some transforms can't do it properly anyway - for example, transforms that capture a ValueProvider which is also not hashable/eq-comparable. I'm surprised that things aren't already very broken because of this. > Fundamentally, I don't see why we should ever compare two PTransform's for equality. > I looked at the code and wondered "can AppliedPTransform simply be identity-hashable", but right now the answer is no because we can create an AppliedPTransform for the same transform applied to the same thing multiple times. > Fixing that appears to be not very easy, but definitely possible. Ideally TransformHierarchy.Node would just know its AppliedPTransform, however a Node can be constructed when there's yet no Pipeline. Suppose there's gotta be some way to propagate a Pipeline into Node.finishSpecifying() (which should be called exactly once on the Node, and this should be enforced), and have finishSpecifying() return the AppliedPTransform, and have the caller use that instead of potentially repeatedly calling .toAppliedPTransform() on the same Node. > [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile? > CC: [~reuvenlax] -- This message was sent by Atlassian JIRA (v6.4.14#64029)