crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Hashimshony <>
Subject Re: Crunch Planner Hint to Not Combine Tasks
Date Tue, 24 Nov 2015 14:14:45 GMT
Try set mapreduce.input.fileinputformat.split.minsize
& mapreduce.input.fileinputformat.split.maxsize to a lower number from the
default (usually 64 MB).
If you know of a specific DoFn in which this is required, better put it
there in its configure function.

On Tue, Nov 24, 2015 at 3:28 PM Robinson, Landon - Landon <> wrote:

> Hi all,
> I have a Crunch job that tries to combine the last four tasks of my
> program into one M/R job.
> That’s normally not a problem, but my data *starts small and grows
> exponentially* in the most major of those DoFn tasks, resulting in spills
> to disk (local, not HDFS).
> I’ve already:
>    - Implemented scaleFactor on the DoFn where the data will emit back
>    more records than it consumed, which is 40.0f
>    - Set io.sort.mb parameter to cluster setting, which is 1792
>    - Implemented map-side compression with snappy
> Data set I’m ingesting is from a previous map-reduce job, which comes out
> to 19 files of 10mb size (which in Crunch comes to 2 splits).
> Help?
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data/Hadoop Engineer
> ---------------------------------------------------------------------------
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *

View raw message