crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Hashimshony <ron.hashimsh...@myheritage.com>
Subject Re: Handling Spills in Crunch
Date Tue, 10 Nov 2015 20:46:41 GMT
This is usually a sign that your nodes don't have enough local disk
storage.
Try to compress the intermediate data (if already compressed, try deflate
and not snappy - better compression rate) or try to enlarge your cluster.

*Ron Hashimshony*
Senior Back-End & Big Data Developer

+972-54-661-7722
ron.hashimshony@myheritage.com
www.myheritage.com


<http://www.myheritage.com/>

<https://www.facebook.com/myheritage>   <https://twitter.com/myheritage>
<http://blog.myheritage.com/>

On Tue, Nov 10, 2015 at 8:07 PM, Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> The specific error I’m getting is related to this:
> https://support.pivotal.io/hc/en-us/articles/205647417-Map-Reduce-job-failed-with-Could-not-find-any-valid-local-directory-for-output-attempt-xxxx-xxxx-m-x-file-out
>
> Does crunch offer a compression shortcut in-code, or am I better off to
> use the compression from mapper output using the map
> reduce.map.output.compress = true param?
>
> Thanks again.
> - Landon
> ---------------------------------------------------------------------------
>
> Landon Robinson
> ---------------------------------------------------------------------------
>
> From: Micah Whitacre <mkwhitacre@gmail.com>
> Reply-To: "user@crunch.apache.org" <user@crunch.apache.org>
> Date: Tuesday, November 10, 2015 at 10:19 AM
> To: "user@crunch.apache.org" <user@crunch.apache.org>
> Subject: Re: Handling Spills in Crunch
>
> Landon,
>
> I don't believe there is anything specific in Crunch that will help you
> but you can definitely tweak some normal Hadoop configuration settings to
> try and help with spilling.  Specifically tweaking settings like spill
> percentage and the io.sort.mb will help reduce the spilling.
>
>
> http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens
> http://www.slideshare.net/cloudera/mr-perf
>
> On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Could use some guidance in dealing with spills. I have a data set that,
>> in a DoFn, *grows* exponentially. As in, my dataset starts small, but I
>> emit back maybe 40% more data than I take in.
>> I’ve tried using scaleFactor() to compensate for this, but I seem to get
>> this error at runtime using a MRPipeline:
>>
>> *org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill
>> failed*
>>
>> Do I need to increase java memory opts perhaps?
>>
>> Best,
>> Landon
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>>
>> ---------------------------------------------------------------------------
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Mime
View raw message