beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2699) AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable
Date Thu, 14 Sep 2017 17:52:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166701#comment-16166701
] 

ASF GitHub Bot commented on BEAM-2699:
--------------------------------------

GitHub user tgroh opened a pull request:

    https://github.com/apache/beam/pull/3853

    [BEAM-2699] Update AppliedPTransform Equality

    Follow this checklist to help us incorporate your contribution quickly and easily:
    
     - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/)
filed for the change (usually before you start working on it).  Trivial changes like typos
do not require a JIRA issue.  Your pull request should address just this issue, without pulling
in other changes.
     - [ ] Each commit in the pull request should have a meaningful subject line and body.
     - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`,
where you replace `BEAM-XXX` with the appropriate JIRA issue.
     - [ ] Write a pull request description that is detailed enough to understand what the
pull request does, how, and why.
     - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
     - [ ] If this contribution is large, please file an Apache [Individual Contributor License
Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---
    Use only node name and the pipeline the two transforms are in. Do not
    consider the reported equality of inputs, outputs, or transforms.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tgroh/beam update_application_equality

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3853.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3853
    
----
commit 6615badce6f3dd2835ac8eac94d792595e86a576
Author: Thomas Groh <tgroh@google.com>
Date:   2017-09-14T17:50:09Z

    Update AppliedPTransform Equality
    
    Use only node name and the pipeline the two transforms are in. Do not
    consider the reported equality of inputs, outputs, or transforms.

----


> AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable
> -------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2699
>                 URL: https://issues.apache.org/jira/browse/BEAM-2699
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>            Reporter: Eugene Kirpichov
>            Assignee: Thomas Groh
>
> There's plenty of occurrences in runners-core of Map or BiMap where the key is an AppliedPTransform.
> However, PTransform does not advertise that it is required to implement equals/hashCode,
and some transforms can't do it properly anyway - for example, transforms that capture a ValueProvider
which is also not hashable/eq-comparable. I'm surprised that things aren't already very broken
because of this.
> Fundamentally, I don't see why we should ever compare two PTransform's for equality.
> I looked at the code and wondered "can AppliedPTransform simply be identity-hashable",
but right now the answer is no because we can create an AppliedPTransform for the same transform
applied to the same thing multiple times.
> Fixing that appears to be not very easy, but definitely possible. Ideally TransformHierarchy.Node
would just know its AppliedPTransform, however a Node can be constructed when there's yet
no Pipeline. Suppose there's gotta be some way to propagate a Pipeline into Node.finishSpecifying()
(which should be called exactly once on the Node, and this should be enforced), and have finishSpecifying()
return the AppliedPTransform, and have the caller use that instead of potentially repeatedly
calling .toAppliedPTransform() on the same Node.
> [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message