crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: ExtractKeyFn scaleFactor seems to be incorrect
Date Thu, 21 May 2015 19:42:14 GMT
Yes, you're right-- file a JIRA for it?

J

On Thu, May 21, 2015 at 10:48 AM, Patel,Stephen <Stephen.Patel@cerner.com>
wrote:

>   I was looking at the PCollectionImpl.by method[0] today, and I think
> that the ExtractKeyFn[1] it's using may not be calculating scaleFactor
> correctly.  The ExtractKeyFn is using the default scaleFactor for a MapFn
> (1.0), but shouldn't it have a scaleFactor of 1 + the input MapFn's
> scaleFactor?
>
>  As an example, if you had a Pcollection<T> and you call by with the
> IdentifyFn, the returned table should have a size of 2 * the original
> collections size, but as it stands now, it will have the same size as the
> original.
>
>  Assuming we later group a table that we constructed with by, won't we
> use (potentially) far fewer reducers than we actually should be?
>
>  [0]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/impl/dist/collect/PCollectionImpl.java#L270
> [1]:
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/fn/ExtractKeyFn.java
>   CONFIDENTIALITY NOTICE This message and any included attachments are
> from Cerner Corporation and are intended only for the addressee. The
> information contained in this message is confidential and may constitute
> inside or non-public information under international, federal, or state
> securities laws. Unauthorized forwarding, printing, copying, distribution,
> or use of such information is strictly prohibited and may be unlawful. If
> you are not the addressee, please promptly delete this message and notify
> the sender of the delivery error by e-mail or you may call Cerner's
> corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>

Mime
View raw message