Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 47376183BA for ; Tue, 10 Nov 2015 18:07:56 +0000 (UTC) Received: (qmail 13713 invoked by uid 500); 10 Nov 2015 18:07:56 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 13672 invoked by uid 500); 10 Nov 2015 18:07:56 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 13660 invoked by uid 99); 10 Nov 2015 18:07:56 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Nov 2015 18:07:56 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 99E4AC0FBE for ; Tue, 10 Nov 2015 18:07:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.991 X-Spam-Level: ** X-Spam-Status: No, score=2.991 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id TmfFV0vGtGoG for ; Tue, 10 Nov 2015 18:07:41 +0000 (UTC) Received: from lxvpmsgbm09.lowes.com (mail11.lowes.com [168.244.164.168]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CC1DB203C1 for ; Tue, 10 Nov 2015 18:07:40 +0000 (UTC) X-AuditID: ac14d41d-f79e66d000007670-81-5642331ddd7c Received: from MSMSGEX10WPRD05.lowes.com (msmsgex10wprd05.lowes.com [172.26.122.15]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by lxvpmsgbm09.lowes.com (SMTP Banner) with SMTP id C4.84.30320.D1332465; Tue, 10 Nov 2015 13:10:37 -0500 (EST) Received: from MSMSGEX10WPRD01.lowes.com ([fe80::c82d:b50a:d7c7:340e]) by MSMSGEX10WPRD05.lowes.com ([fe80::fc50:b9c7:5773:4d92%27]) with mapi id 14.03.0248.002; Tue, 10 Nov 2015 13:07:34 -0500 From: "Robinson, Landon - Landon" To: "user@crunch.apache.org" Subject: Re: Handling Spills in Crunch Thread-Topic: Handling Spills in Crunch Thread-Index: AQHRG8gjG8rYv49K8kKYO9MfgTNYi56VspOA///bQQA= Date: Tue, 10 Nov 2015 18:07:33 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.26.202.27] Content-Type: multipart/alternative; boundary="_000_D2679C484BA7landontrobinsonlowescom_" MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupnleLIzCtJLcpLzFFi42JZI1XFrytr7BRm0LZd02LyrqWMDowej44t YAlgjGpgtEnMy8svSSxJVUhJLU62VfJLLVdwySxOzknMzE0tKtZVMDJSyCrNqVQwMjA0VlLI TLFVMlVSKMhJTE7NTc0rsVVKLChIzUtRsuNSwAA2QGWZeQqpecn5KZl56bZKnsH+uhYWppa6 hkp2fv4hns6uVgkNEhkzn35iKvj8krFibfNv9gbG2WcZuxg5OSQETCT27NoPZYtJXLi3ng3E FhJYzCSx+bBdFyMXkH2UUeLZt22sIAk2AXuJnoWnwIpEBEwlNp/cwgxiCwuoSXz69ZgdIq4u cXTdDmYI20pi06tVYL0sAqoS32bNZQKxeQXMJeaeO80EsaxO4vSRn2A1nAKBEg1zXrCA2IxA B30/tQashllAXOLWk/lMEIcKSCzZc54ZwhaVePn4HyuErSjx8/pJRoj6aInfC9qgdglKnJz5 hGUCo8gsJKNmISmbhaQMIm4g8f7cfGYIW1ti2cLXULa+xMYvZxkhbAuJvRdWMSGrWcDIsYpR NKeirCC3OD0p18BSLye/PLVYLzk/dxMjKJ2IXJHdwTh7ve0hRgEORiUe3j3fHcKEWBPLiitz DzGaAoNoIrOUaHI+MGnllcQbGlqaGFmaGZtbmhqaKInzNjHbhgkJpAMTUnZqakFqUXxRaU5q 8SFGJg5OqQZGWY3b5u5VdWclOKf631CvXRXtaqQ/84315CthhpEb1bPv+W8U99Z/9fvuvb6P F68LLtnmMkf1Y8ei6w1B014u0t/w6HXeXCudNUK8Pjs5EqbH2TUdqP8VVl0qpbvz67f0Y1IB LzZGyF34ZHj0pmFWxKPqA9knjzyJSrOYJca26tmjd9/uSFnwKLEUZyQaajEXFScCAN5hZwQi AwAA --_000_D2679C484BA7landontrobinsonlowescom_ Content-Type: text/plain; charset="Windows-1252" content-transfer-encoding: quoted-printable The specific error I=92m getting is related to this: https://support.pivotal= .io/hc/en-us/articles/205647417-Map-Reduce-job-failed-with-Could-not-find-an= y-valid-local-directory-for-output-attempt-xxxx-xxxx-m-x-file-out Does crunch offer a compression shortcut in-code, or am I better off to use= the compression from mapper output using the map reduce.map.output.compress= =3D true param? Thanks again. - Landon --------------------------------------------------------------------------- Landon Robinson --------------------------------------------------------------------------- From: Micah Whitacre > Reply-To: "user@crunch.apache.org" > Date: Tuesday, November 10, 2015 at 10:19 AM To: "user@crunch.apache.org" > Subject: Re: Handling Spills in Crunch Landon, I don't believe there is anything specific in Crunch that will help you but= you can definitely tweak some normal Hadoop configuration settings to try a= nd help with spilling. Specifically tweaking settings like spill percentage= and the io.sort.mb will help reduce the spilling. http://stackoverflow.com/questions/27890887/why-does-hadoop-spilling-happens http://www.slideshare.net/cloudera/mr-perf On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon - Landon > wrote: Could use some guidance in dealing with spills. I have a data set that, in a= DoFn, grows exponentially. As in, my dataset starts small, but I emit back= maybe 40% more data than I take in. I=92ve tried using scaleFactor() to compensate for this, but I seem to get t= his error at runtime using a MRPipeline: org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill failed Do I need to increase java memory opts perhaps? Best, Landon --------------------------------------------------------------------------- Landon Robinson --------------------------------------------------------------------------- NOTICE: All information in and attached to the e-mails below may be propriet= ary, confidential, privileged and otherwise protected from improper or erron= eous disclosure. If you are not the sender's intended recipient, you are not= authorized to intercept, read, print, retain, copy, forward, or disseminate= this message. If you have erroneously received this communication, please n= otify the sender immediately by phone (704-758-1000) or= by e-mail and destroy all copies of this message electronic, paper, or othe= rwise. By transmitting documents via this email: Users, Customers, Suppliers and Ve= ndors collectively acknowledge and agree the transmittal of information via= email is voluntary, is offered as a convenience, and is not a secured metho= d of communication; Not to transmit any payment information E.G. credit card= , debit card, checking account, wire transfer information, passwords, or sen= sitive and personal information E.G. Driver's license, DOB, social security,= or any other information the user wishes to remain confidential; To transmi= t only non-confidential information such as plans, pictures and drawings and= to assume all risk and liability for and indemnify Lowe's from any claims,= losses or damages that may arise from the transmittal of documents or inclu= ding non-confidential information in the body of an email transmittal. Thank= you. NOTICE: All information in and attached to the e-mails below may be propriet= ary, confidential, privileged and otherwise protected from improper or erron= eous disclosure. If you are not the sender's intended recipient, you are not= authorized to intercept, read, print, retain, copy, forward, or disseminate= this message. If you have erroneously received this communication, please n= otify the sender immediately by phone (704-758-1000) or by e-mail and destro= y all copies of this message electronic, paper, or otherwise. By transmitting documents via this email: Users, Customers, Suppliers and Ve= ndors collectively acknowledge and agree the transmittal of information via= email is voluntary, is offered as a convenience, and is not a secured metho= d of communication; Not to transmit any payment information E.G. credit card= , debit card, checking account, wire transfer information, passwords, or sen= sitive and personal information E.G. Driver's license, DOB, social security,= or any other information the user wishes to remain confidential; To transmi= t only non-confidential information such as plans, pictures and drawings and= to assume all risk and liability for and indemnify Lowe's from any claims,= losses or damages that may arise from the transmittal of documents or inclu= ding non-confidential information in the body of an email transmittal. Thank= you. --_000_D2679C484BA7landontrobinsonlowescom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <81932F7B354F7D43850C1098DEE851B7@lowes.com> content-transfer-encoding: quoted-printable

Does crunch offer a compression shortcut in-code, or am I better off to= use the compression from mapper output using the map reduce.map.output.comp= ress =3D true param?

Thanks again.
- Landon
-----------------------------------------------------------------------= ----

Landon Robinson
---------------------------------------------------------------------------<= /div>

From: Micah Whitacre <mkwhitacre@gmail.com>
Reply-To: "user@crunch.apache.org" <user@crunch.apache.org>
Date: Tuesday, November 10, 2015 at= 10:19 AM
To: "user@crunch.apache.org" <user@crunch.apache.org>
Subject: Re: Handling Spills in Crun= ch

Landon,

I don't believe there is anything specific in Crunch that will help you= but you can definitely tweak some normal Hadoop configuration settings to t= ry and help with spilling.  Specifically tweaking settings like spill p= ercentage and the io.sort.mb will help reduce the spilling.


On Tue, Nov 10, 2015 at 8:57 AM, Robinson, Landon= - Landon <landon.t.robinson@lowes.com> wrote:
Could use some guidance in dealing with spills. I have a data set that,= in a DoFn, grows exponentially. As in, my dataset starts small, but I emit back= maybe 40% more data than I take in.
I=92ve tried using scaleFactor() to compensate for this, but I seem to= get this error at runtime using a MRPipeline:

org.apache.crunch.CrunchRuntimeException: java.io.IOException: Spill= failed

Do I need to increase java memory opts perhaps?

Best,
Landon
-----------------------------------------------------------------------= ----
Landon Robinson
---------------------------------------------------------------------------<= /div>
NOTICE: All information in and attached to the e-mails below may be pro= prietary, confidential, privileged and otherwise protected from improper or= erroneous disclosure. If you are not the sender's intended recipient, you a= re not authorized to intercept, read, print, retain, copy, forward, or disseminate this message. If you hav= e erroneously received this communication, please notify the sender immediat= ely by phone (704-758-1000) or by e-mail and destroy all copies of this message elec= tronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and= Vendors collectively acknowledge and agree the transmittal of information v= ia email is voluntary, is offered as a convenience, and is not a secured met= hod of communication; Not to transmit any payment information E.G. credit card, debit card, checking acc= ount, wire transfer information, passwords, or sensitive and personal inform= ation E.G. Driver's license, DOB, social security, or any other information= the user wishes to remain confidential; To transmit only non-confidential information such as plans, pictures and d= rawings and to assume all risk and liability for and indemnify Lowe's from a= ny claims, losses or damages that may arise from the transmittal of document= s or including non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be propriet= ary, confidential, privileged and otherwise protected from improper or erron= eous disclosure. If you are not the sender's intended recipient, you are not= authorized to intercept, read, print, retain, copy, forward, or disseminate= this message. If you have erroneously received this communication, please n= otify the sender immediately by phone (704-758-1000) or by e-mail and destro= y all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and= Vendors collectively acknowledge and agree the transmittal of information v= ia email is voluntary, is offered as a convenience, and is not a secured met= hod of communication; Not to transmit any payment information E.G. credit ca= rd, debit card, checking account, wire transfer information, passwords, or s= ensitive and personal information E.G. Driver's license, DOB, social securit= y, or any other information the user wishes to remain confidential; To trans= mit only non-confidential information such as plans, pictures and drawings a= nd to assume all risk and liability for and indemnify Lowe's from any claims= , losses or damages that may arise from the transmittal of documents or incl= uding non-confidential information in the body of an email transmittal. Than= k you. --_000_D2679C484BA7landontrobinsonlowescom_--