crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Landon - Landon" <landon.t.robin...@lowes.com>
Subject Re: Handling Spills in Crunch
Date Wed, 11 Nov 2015 17:09:58 GMT
Through a combination of a few conf parameters, I was able to fix the spills issue.

  *   Map output compression w/snappy
  *   Setting task.io.sort.mb to system setting

Properties File:

mapred.compress.map.output=true

mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec

mapreduce.task.io.sort.mb=1792

Crunch Code:

crunchConf.set("mapred.compress.map.output", mapCompress);
crunchConf.set("mapred.map.output.compression.codec", mapCompressionCodec);
crunchConf.set("mapreduce.task.io.sort.mb", mapTaskSortMB);

Pipeline pipeline = new MRPipeline(TransformMR.class, "Crunch Pipeline", crunchConf);

Thanks everyone for the input. We have a beefy cluster, but Crunch didn’t know some of our
settings like io.sort.mb (which was set to 100mb, but our number is 1792).
Thanks again, just thought I’d share the learning.
---------------------------------------------------------------------------
Landon Robinson
---------------------------------------------------------------------------

From: Micah Whitacre <mkwhitacre@gmail.com<mailto:mkwhitacre@gmail.com>>
Reply-To: "user@crunch.apache.org<mailto:user@crunch.apache.org>" <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Date: Tuesday, November 10, 2015 at 3:27 PM
To: "user@crunch.apache.org<mailto:user@crunch.apache.org>" <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Subject: Re: Handling Spills in Crunch

In my quick search I didn't find any shortcuts but Crunch should honor any of the normal Hadoop
config.  If you find it doesn't then feel free to log an issue.

I believe the general rule is that if you set the io.sort.mb to 25% of your Map or Reduce
JVM that should help cut down on data written to local disk as well.

On Tue, Nov 10, 2015 at 12:07 PM, Robinson, Landon - Landon <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>
wrote:
The specific error I’m getting is related to this: https://support.pivotal.io/hc/en-us/articles/205647417-Map-Reduce-job-failed-with-Could-not-find-any-valid-local-directory-for-output-attempt-xxxx-xxxx-m-x-file-out

Does crunch offer a compression shortcut in-code, or am I better off to use the compression
from mapper output using the map reduce.map.output.compress = true param?

Thanks again.
- Landon
---------------------------------------------------------------------------

Landon Robinson
---------------------------------------------------------------------------

From: Micah Whitacre <mkwhitacre@gmail.com<mailto:mkwhitacre@gmail.com>>
Reply-To: "user@crunch.apache.org<mailto:user@crunch.apache.org>" <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Date: Tuesday, November 10, 2015 at 10:19 AM
To: "user@crunch.apache.org<mailto:user@crunch.apache.org>" <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Subject: Re: Handling Spills in Crunch

Landon,

I don't believe there is anything specific in Crunch that will help you but you can definitely
tweak some normal Hadoop configuration settings to try and help with spilling.  Specifically
tweaking settings like spill percentage and the io.sort.mb will help reduce the spilling.

http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens
http://www.slideshare.net/cloudera/mr-perf

On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>
wrote:
Could use some guidance in dealing with spills. I have a data set that, in a DoFn, grows exponentially.
As in, my dataset starts small, but I emit back maybe 40% more data than I take in.
I’ve tried using scaleFactor() to compensate for this, but I seem to get this error at runtime
using a MRPipeline:

org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill failed

Do I need to increase java memory opts perhaps?

Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000<tel:%28704-758-1000>) or
by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000<tel:%28704-758-1000>) or
by e-mail and destroy all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.


NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

Mime
View raw message