beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2699) AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable
Date Sat, 29 Jul 2017 00:38:00 GMT


Eugene Kirpichov commented on BEAM-2699:

By the way, this was found when testing a PR where DirectRunner to/from proto machinery ended
up hashing a transform that captured a null ValueProvider and used it in hashCode() as "provider.get()"
- this threw an NPE, but generally this code shouldn't have been called at all at construction
time, because ValueProvider could have been inaccessible. On the other hand, hashing the ValueProvider
itself is also incorrect (and throws errors in runners-core instead) because ValueProvider
does not implement hashCode/equals.

> AppliedPTransform is used as a key in hashmaps but PTransform is not hashable/equality-comparable
> -------------------------------------------------------------------------------------------------
>                 Key: BEAM-2699
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-core
>            Reporter: Eugene Kirpichov
>            Assignee: Thomas Groh
> There's plenty of occurrences in runners-core of Map or BiMap where the key is an AppliedPTransform.
> However, PTransform does not advertise that it is required to implement equals/hashCode,
and some transforms can't do it properly anyway - for example, transforms that capture a ValueProvider
which is also not hashable/eq-comparable. I'm surprised that things aren't already very broken
because of this.
> Fundamentally, I don't see why we should ever compare two PTransform's for equality.
> I looked at the code and wondered "can AppliedPTransform simply be identity-hashable",
but right now the answer is no because we can create an AppliedPTransform for the same transform
applied to the same thing multiple times.
> Fixing that appears to be not very easy, but definitely possible. Ideally TransformHierarchy.Node
would just know its AppliedPTransform, however a Node can be constructed when there's yet
no Pipeline. Suppose there's gotta be some way to propagate a Pipeline into Node.finishSpecifying()
(which should be called exactly once on the Node, and this should be enforced), and have finishSpecifying()
return the AppliedPTransform, and have the caller use that instead of potentially repeatedly
calling .toAppliedPTransform() on the same Node.
> [~kenn] is on vacation but perhaps [~tgroh] can help with this meanwhile?
> CC: [~reuvenlax]

This message was sent by Atlassian JIRA

View raw message