crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Best Practice for Materialization
Date Wed, 24 Feb 2016 20:20:35 GMT
Turn the PCollection into a ReadableData object, which is serializable and
can be passed into a DoFn and read in during initialization for use during
processing. That's how the MapsideJoin stuff is implemented.
On Wed, Feb 24, 2016 at 12:19 PM Micah Whitacre <>

> How do you need the data in the DoFn?  One easy way of doing this might be
> a MapSideJoin[1] but that would probably require similar keys for what you
> are doing with the data and might not fit with adding supplementary data in
> the DoFn like you are intending.
> [1] -
> On Wed, Feb 24, 2016 at 2:04 PM, Robinson, Landon - Landon <
>> wrote:
>> Crunch Gurus,
>> Say I have a small data set of key/value pairs I’m reading into a
>> Pcollection. I want to give that small set as a supplementary data set to
>> DoFns for comparisons.
>> I’ve done this before with hardcoded String arrays and such, but wanted
>> to know what best practice is for taking the contents of a very small
>> Pcollection, and handing it as an object to a DoFn.
>> I know I can turn it into a Hashmap and pass it as an argument/param, but
>> is there a recommended way in Crunch? Thanks!
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>> ---------------------------------------------------------------------------
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *

View raw message