Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2136E18986 for ; Tue, 3 Nov 2015 16:38:13 +0000 (UTC) Received: (qmail 22044 invoked by uid 500); 3 Nov 2015 16:38:13 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 22004 invoked by uid 500); 3 Nov 2015 16:38:13 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 21994 invoked by uid 99); 3 Nov 2015 16:38:12 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Nov 2015 16:38:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7B2B5C094C for ; Tue, 3 Nov 2015 16:38:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.989 X-Spam-Level: ** X-Spam-Status: No, score=2.989 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Tbd-KYO3cqA3 for ; Tue, 3 Nov 2015 16:38:10 +0000 (UTC) Received: from LXVPMSGBM23.LOWES.COM (mail14.lowes.com [168.244.164.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id CF96323871 for ; Tue, 3 Nov 2015 16:38:09 +0000 (UTC) X-AuditID: ac14d421-f79346d000004b51-3d-5638e2ea8165 Received: from msmsgex10wprd02.lowes.com (msmsgex10wprd02.lowes.com [172.26.122.12]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by LXVPMSGBM23.LOWES.COM (SMTP Banner) with SMTP id A4.16.19281.BE2E8365; Tue, 3 Nov 2015 11:38:03 -0500 (EST) Received: from MSMSGEX10WPRD01.lowes.com ([fe80::c82d:b50a:d7c7:340e]) by msmsgex10wprd02.lowes.com ([fe80::64ac:f51a:1b9b:ac2a%19]) with mapi id 14.03.0248.002; Tue, 3 Nov 2015 11:38:02 -0500 From: "Robinson, Landon - Landon" To: "user@crunch.apache.org" Subject: Understanding ScaleFactor in Crunch Thread-Topic: Understanding ScaleFactor in Crunch Thread-Index: AQHRFlYB6ax4zhqSUE2goz2m8fQeVA== Date: Tue, 3 Nov 2015 16:38:02 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.26.202.27] Content-Type: multipart/alternative; boundary="_000_D25E4D1949BElandontrobinsonlowescom_" MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrGIsWRmVeSWpSXmKPExsWyRqqKR/f1I4swg90zBCwm71rK6MDo8ejY ApYAxqgGRpvEvLz8ksSSVIWU1OJkWyW/1HIFl8zi5JzEzNzUomJdBSMjhazSnEoFIwNDYyWF zBRbJVMlhYKcxOTU3NS8ElulxIKC1LwUJTsuBQxgA1SWmaeQmpecn5KZl26r5Bnsr2thYWqp a6hk5+cf4unsapXQIJFxZcYO5oIb6xkr/q88zNzAOHUWYxcjJ4eEgInErifTmCBsMYkL99az dTFycQgJLGaSaF1wkQ0kISRwhFHiyxYdEJtNwF6iZ+EpsLiIgKnE5pNbmEFsYQEdictT/rJA xA0l7i75zARh60msuHoWrIZFQEWi//8fsMW8AuYSN3s3g9UwAi3+fmoNmM0sIC5x68l8qIME JJbsOc8MYYtKvHz8jxXCVpT4ef0kI0R9tERfz2UWiJmCEidnPmGZwCg0C8moWUjKZiEpg4gb SLw/N58ZwtaWWLbwNZStL7Hxy1nGWYwcQLaFxOkvZchKFjByrGIUzakoK8gtTk/KNTLWy8kv Ty3WS87P3cQISg4iVxR3MM5ocTrEKMDBqMTDq7LCPEyINbGsuDL3EKMpMCQmMkuJJucDU1Be SbyhoaWJkaWZsbmlqaGJkjjvhibjMCGBdGB6yU5NLUgtii8qzUktPsTIxMEp1cAY7djtoNn1 7oKhacOTU+HbX04zfGk7P2ALw6rg/4W9gVnB1xnCQ+epVT2Sdth1qcT/ec429nJr3/TCI1qr 4nbJ1vJN0Kj+tSl4SZ18smtB0/Tr18UeTp+X7/q6xyH016TLr69kW73icl/BzMyad53F5tMT 4bJlvg6X/f9osVUfPWby8h5D2WwlluKMREMt5qLiRAAFRVO9CQMAAA== --_000_D25E4D1949BElandontrobinsonlowescom_ Content-Type: text/plain; charset="Windows-1252" content-transfer-encoding: quoted-printable All, I=92m trying to understand how I might use scaleFactor() in my Crunch code. My use case is this: I have data that I read into a Pcollection that is smal= ler than my system=92s block size, but when processed in a DoFn, grows prett= y exponentially. So what started as a 10mb file might become 10 times larger. To prevent spills and memory issues, how could I leverage something like sca= leFactor() (or whatever is needed) to indicate to the Crunch Planner that my= resulting Pcollection will grow exponentially? Can I tell it to leverage more mappers/reducers in the DoFn? Guidance, if you could! Thanks, Landon --------------------------------------------------------------------------- Landon Robinson --------------------------------------------------------------------------- NOTICE: All information in and attached to the e-mails below may be propriet= ary, confidential, privileged and otherwise protected from improper or erron= eous disclosure. If you are not the sender's intended recipient, you are not= authorized to intercept, read, print, retain, copy, forward, or disseminate= this message. If you have erroneously received this communication, please n= otify the sender immediately by phone (704-758-1000) or by e-mail and destro= y all copies of this message electronic, paper, or otherwise. By transmitting documents via this email: Users, Customers, Suppliers and Ve= ndors collectively acknowledge and agree the transmittal of information via= email is voluntary, is offered as a convenience, and is not a secured metho= d of communication; Not to transmit any payment information E.G. credit card= , debit card, checking account, wire transfer information, passwords, or sen= sitive and personal information E.G. Driver's license, DOB, social security,= or any other information the user wishes to remain confidential; To transmi= t only non-confidential information such as plans, pictures and drawings and= to assume all risk and liability for and indemnify Lowe's from any claims,= losses or damages that may arise from the transmittal of documents or inclu= ding non-confidential information in the body of an email transmittal. Thank= you. --_000_D25E4D1949BElandontrobinsonlowescom_ Content-Type: text/html; charset="Windows-1252" Content-ID: <05F2F319E1B9C1489523FE7667D05BC9@lowes.com> content-transfer-encoding: quoted-printable
All,

I=92m trying to understand how I might use scaleFactor() in my Crunch c= ode.
My use case is this: I have data that I read into a Pcollection that is= smaller than my system=92s block size, but when processed in a DoFn, grows pretty exponentially.

So what started as a 10mb file might become 10 times larger.

To prevent spills and memory issues, how could I leverage something lik= e scaleFactor() (or whatever is needed) to indicate to the Crunch Planner th= at my resulting Pcollection will grow exponentially?
Can I tell it to leverage more mappers/reducers in the DoFn?

Guidance, if you could!

Thanks,
Landon
-----------------------------------------------------------------------= ----

Landon Robinson
---------------------------------------------------------------------------<= /div> NOTICE: All information in and attached to the e-mails below may be propriet= ary, confidential, privileged and otherwise protected from improper or erron= eous disclosure. If you are not the sender's intended recipient, you are not= authorized to intercept, read, print, retain, copy, forward, or disseminate= this message. If you have erroneously received this communication, please n= otify the sender immediately by phone (704-758-1000) or by e-mail and destro= y all copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and= Vendors collectively acknowledge and agree the transmittal of information v= ia email is voluntary, is offered as a convenience, and is not a secured met= hod of communication; Not to transmit any payment information E.G. credit ca= rd, debit card, checking account, wire transfer information, passwords, or s= ensitive and personal information E.G. Driver's license, DOB, social securit= y, or any other information the user wishes to remain confidential; To trans= mit only non-confidential information such as plans, pictures and drawings a= nd to assume all risk and liability for and indemnify Lowe's from any claims= , losses or damages that may arise from the transmittal of documents or incl= uding non-confidential information in the body of an email transmittal. Than= k you. --_000_D25E4D1949BElandontrobinsonlowescom_--