crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Landon - Landon" <>
Subject Understanding ScaleFactor in Crunch
Date Tue, 03 Nov 2015 16:38:02 GMT

I’m trying to understand how I might use scaleFactor() in my Crunch code.
My use case is this: I have data that I read into a Pcollection that is smaller than my system’s
block size, but when processed in a DoFn, grows pretty exponentially.

So what started as a 10mb file might become 10 times larger.

To prevent spills and memory issues, how could I leverage something like scaleFactor() (or
whatever is needed) to indicate to the Crunch Planner that my resulting Pcollection will grow
Can I tell it to leverage more mappers/reducers in the DoFn?

Guidance, if you could!


Landon Robinson

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

View raw message