crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Resolved] (CRUNCH-483) Scrunch .map does not allow mapping to a PCollection[(A,B)]
Date Sat, 03 Jan 2015 20:45:34 GMT


Josh Wills resolved CRUNCH-483.
       Resolution: Fixed
    Fix Version/s: 0.12.0

Back from vacation and slowly putting myself back to work. Thanks for this one, David!

> Scrunch .map does not allow mapping to a PCollection[(A,B)]
> -----------------------------------------------------------
>                 Key: CRUNCH-483
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Scrunch
>    Affects Versions: 0.11.0
>            Reporter: David Whiting
>            Priority: Minor
>             Fix For: 0.12.0
>         Attachments: 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch
> When using Scrunch PCollections and attempting to map to a pair of values, the keyvalue
implicit function in CanParallelDo will "upgrade" the result to a PTable[K, V]. This is often
the desired behaviour, but as Scrunch PTable is not an extension of Scrunch PCollection, then
there are cases where this is not what is wanted.
> Concrete example from music land: I am trying to count the number of plays for each track
in each country. I want to do this:
> trackPlayedMessage(tpm => (tpm.track,
> However because of the implicit CanParallelTransform that is substituted, I cannot call
.count() because what I get is a PTable and not a PCollection.
> There are a number of possible remedies that I'm happy to have a go at, but I'd like
some input as to which would be best:
> - Make PTable[K,V] a real extension of PCollection[(K, V)] (analagous to how it works
in Crunch)
> - Add an "asPCollection" method to PTable which "downgrades" the PTable[K, V] to a PCollection[(K,
> - Make mapToTable and flatMapToTable distinct from map and flatMap to make the choice
explicity (warning: breaks existing API).
> - Expose an equivalent to LowPriorityParallelTransforms.single to be invoked explicitly
to get a collection instead of a table using .map(fn)(implicitly, single)
> - Something else

This message was sent by Atlassian JIRA

View raw message