crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Handling Spills in Crunch
Date Wed, 11 Nov 2015 17:30:14 GMT
Good karma Landon! Thanks!
On Wed, Nov 11, 2015 at 9:10 AM Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> Through a combination of a few conf parameters, I was able to fix the
> spills issue.
>
>    - Map output compression w/snappy
>    - Setting task.io.sort.mb to system setting
>
>
> *Properties File:*
>
> mapred.compress.map.output=true
>
> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec
>
> mapreduce.task.io.sort.mb=1792
>
> *Crunch Code:*
>
> crunchConf.set("mapred.compress.map.output", mapCompress);
> crunchConf.set("mapred.map.output.compression.codec", mapCompressionCodec);
> crunchConf.set("mapreduce.task.io.sort.mb", mapTaskSortMB);
>
> Pipeline pipeline = new MRPipeline(TransformMR.class, "Crunch Pipeline", crunchConf);
>
> Thanks everyone for the input. We have a beefy cluster, but Crunch didn’t
> know some of our settings like io.sort.mb (which was set to 100mb, but our
> number is 1792).
> Thanks again, just thought I’d share the learning.
> ---------------------------------------------------------------------------
> Landon Robinson
> ---------------------------------------------------------------------------
>
> From: Micah Whitacre <mkwhitacre@gmail.com>
> Reply-To: "user@crunch.apache.org" <user@crunch.apache.org>
> Date: Tuesday, November 10, 2015 at 3:27 PM
>
> To: "user@crunch.apache.org" <user@crunch.apache.org>
> Subject: Re: Handling Spills in Crunch
>
> In my quick search I didn't find any shortcuts but Crunch should honor any
> of the normal Hadoop config.  If you find it doesn't then feel free to log
> an issue.
>
> I believe the general rule is that if you set the io.sort.mb to 25% of
> your Map or Reduce JVM that should help cut down on data written to local
> disk as well.
>
> On Tue, Nov 10, 2015 at 12:07 PM, Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> The specific error I’m getting is related to this:
>> https://support.pivotal.io/hc/en-us/articles/205647417-Map-Reduce-job-failed-with-Could-not-find-any-valid-local-directory-for-output-attempt-xxxx-xxxx-m-x-file-out
>>
>> Does crunch offer a compression shortcut in-code, or am I better off to
>> use the compression from mapper output using the map
>> reduce.map.output.compress = true param?
>>
>> Thanks again.
>> - Landon
>>
>> ---------------------------------------------------------------------------
>>
>> Landon Robinson
>>
>> ---------------------------------------------------------------------------
>>
>> From: Micah Whitacre <mkwhitacre@gmail.com>
>> Reply-To: "user@crunch.apache.org" <user@crunch.apache.org>
>> Date: Tuesday, November 10, 2015 at 10:19 AM
>> To: "user@crunch.apache.org" <user@crunch.apache.org>
>> Subject: Re: Handling Spills in Crunch
>>
>> Landon,
>>
>> I don't believe there is anything specific in Crunch that will help you
>> but you can definitely tweak some normal Hadoop configuration settings to
>> try and help with spilling.  Specifically tweaking settings like spill
>> percentage and the io.sort.mb will help reduce the spilling.
>>
>>
>> http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens
>> http://www.slideshare.net/cloudera/mr-perf
>>
>> On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Could use some guidance in dealing with spills. I have a data set that,
>>> in a DoFn, *grows* exponentially. As in, my dataset starts small, but I
>>> emit back maybe 40% more data than I take in.
>>> I’ve tried using scaleFactor() to compensate for this, but I seem to get
>>> this error at runtime using a MRPipeline:
>>>
>>> *org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill
>>> failed*
>>>
>>> Do I need to increase java memory opts perhaps?
>>>
>>> Best,
>>> Landon
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>>
>>> ---------------------------------------------------------------------------
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone
>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>> electronic, paper, or otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Mime
View raw message