Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 75EE19CBD for ; Thu, 18 Dec 2014 12:57:14 +0000 (UTC) Received: (qmail 25711 invoked by uid 500); 18 Dec 2014 12:57:14 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 25478 invoked by uid 500); 18 Dec 2014 12:57:14 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 25454 invoked by uid 500); 18 Dec 2014 12:57:14 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 25444 invoked by uid 99); 18 Dec 2014 12:57:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Dec 2014 12:57:14 +0000 Date: Thu, 18 Dec 2014 12:57:13 +0000 (UTC) From: "David Whiting (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CRUNCH-483) Scrunch .map does not allow mapping to a PCollection[(A,B)] MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Whiting updated CRUNCH-483: --------------------------------- Attachment: 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch Attached patch for the "second best" option, as making PTable a PCollection is indeed problematic. > Scrunch .map does not allow mapping to a PCollection[(A,B)] > ----------------------------------------------------------- > > Key: CRUNCH-483 > URL: https://issues.apache.org/jira/browse/CRUNCH-483 > Project: Crunch > Issue Type: Bug > Components: Scrunch > Affects Versions: 0.11.0 > Reporter: David Whiting > Priority: Minor > Attachments: 0001-Add-asPCollection-method-to-PTable-and-corresponding.patch > > > When using Scrunch PCollections and attempting to map to a pair of values, the keyvalue implicit function in CanParallelDo will "upgrade" the result to a PTable[K, V]. This is often the desired behaviour, but as Scrunch PTable is not an extension of Scrunch PCollection, then there are cases where this is not what is wanted. > Concrete example from music land: I am trying to count the number of plays for each track in each country. I want to do this: > trackPlayedMessage(tpm => (tpm.track, tpm.country)).count() > However because of the implicit CanParallelTransform that is substituted, I cannot call .count() because what I get is a PTable and not a PCollection. > There are a number of possible remedies that I'm happy to have a go at, but I'd like some input as to which would be best: > - Make PTable[K,V] a real extension of PCollection[(K, V)] (analagous to how it works in Crunch) > - Add an "asPCollection" method to PTable which "downgrades" the PTable[K, V] to a PCollection[(K, V)]. > - Make mapToTable and flatMapToTable distinct from map and flatMap to make the choice explicity (warning: breaks existing API). > - Expose an equivalent to LowPriorityParallelTransforms.single to be invoked explicitly to get a collection instead of a table using .map(fn)(implicitly, single) > - Something else -- This message was sent by Atlassian JIRA (v6.3.4#6332)