crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Handling Spills in Crunch
Date Tue, 10 Nov 2015 15:19:15 GMT
Landon,

I don't believe there is anything specific in Crunch that will help you but
you can definitely tweak some normal Hadoop configuration settings to try
and help with spilling.  Specifically tweaking settings like spill
percentage and the io.sort.mb will help reduce the spilling.

http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens
http://www.slideshare.net/cloudera/mr-perf

On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> Could use some guidance in dealing with spills. I have a data set that, in
> a DoFn, *grows* exponentially. As in, my dataset starts small, but I emit
> back maybe 40% more data than I take in.
> I’ve tried using scaleFactor() to compensate for this, but I seem to get
> this error at runtime using a MRPipeline:
>
> *org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill
> failed*
>
> Do I need to increase java memory opts perhaps?
>
> Best,
> Landon
> ---------------------------------------------------------------------------
> Landon Robinson
> ---------------------------------------------------------------------------
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Mime
View raw message