crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Landon - Landon" <>
Subject Crunch Planner Hint to Not Combine Tasks
Date Tue, 24 Nov 2015 13:28:40 GMT
Hi all,

I have a Crunch job that tries to combine the last four tasks of my program into one M/R job.
That’s normally not a problem, but my data starts small and grows exponentially in the most
major of those DoFn tasks, resulting in spills to disk (local, not HDFS).

I’ve already:

  *   Implemented scaleFactor on the DoFn where the data will emit back more records than
it consumed, which is 40.0f
  *   Set io.sort.mb parameter to cluster setting, which is 1792
  *   Implemented map-side compression with snappy

Data set I’m ingesting is from a previous map-reduce job, which comes out to 19 files of
10mb size (which in Crunch comes to 2 splits).
Landon Robinson
Big Data/Hadoop Engineer

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

View raw message