crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Hansen <l...@wealthfront.com>
Subject Re: Hadoop Configuration from DoFn
Date Tue, 13 Oct 2015 20:56:36 GMT
Thanks for the quick replies everyone!  Setting the configuration at the
pipeline level (as opposed to the DoFn level) worked.

On Tue, Oct 13, 2015 at 1:08 PM, Micah Whitacre <mkwhitacre@gmail.com>
wrote:

> Yeah was misconstruing it with the setContext(...) method which provides
> the configuration when the job is actually running.[1]  Luke, you might
> look at generating a plan of your pipeline to see what other DoFns might be
> inside the same job and causing a conflict with your settings.
>
> We typically do the global settings vs trying to tweak at each DoFn simply
> because it allows us to avoid worrying about which DoFn's get grouped into
> a single task and override each other.
>
> [1] -
> http://crunch.apache.org/apidocs/0.12.0/org/apache/crunch/DoFn.html#configure(org.apache.hadoop.conf.Configuration)
>
> On Tue, Oct 13, 2015 at 3:02 PM, Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> You can do it both ways: at the DoFn level or at the pipeline level.
>>
>> For global settings, go with the pipeline level. For individual
>> jobs/tasks, go DoFn Level.
>>
>> *Pipeline Level:*
>>
>> Configuration crunchConf = getConf();
>> crunchConf.set("mapred.job.queue.name", "batch");
>> Pipeline pipeline = new MRPipeline(TransformKronosMR.class, *“*My Pipeline" ,crunchConf);
>>
>>
>> *DoFn Level (as mentioned):*
>>
>> @Override
>> public void configure(Configuration conf) {
>>   conf.set("mapreduce.map.java.opts", "-Xmx3900m");
>>   conf.set("mapreduce.reduce.java.opts", "-Xmx3900m");
>>
>>   conf.set("mapreduce.map.memory.mb", "4096");
>>   conf.set("mapreduce.reduce.memory.mb", "4096");
>> }
>>
>>
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data/Hadoop Engineer
>> Lowe’s Companies Inc. | IT Business Intelligence
>>
>> ---------------------------------------------------------------------------
>>
>> From: Micah Whitacre <mkwhitacre@gmail.com>
>> Reply-To: "user@crunch.apache.org" <user@crunch.apache.org>
>> Date: Tuesday, October 13, 2015 at 3:55 PM
>> To: "user@crunch.apache.org" <user@crunch.apache.org>
>> Subject: Re: Hadoop Configuration from DoFn
>>
>> Luke,
>>   Generally that configuration should be set on the Configuration object
>> passed to Pipeline vs on the individual DoFns.  The configure(...) method
>> is called when re-instantiating the DoFn on the Map/Reduce task and at that
>> point those memory settings wouldn't be honored.
>>
>> On Tue, Oct 13, 2015 at 2:52 PM, Luke Hansen <luke@wealthfront.com>
>> wrote:
>>
>>> Does anyone know if this is the right way to configure Hadoop from a
>>> Crunch DoFn?  This didn't seem to affect anything.
>>>
>>> Thanks!
>>>
>>> @Override
>>> public void configure(Configuration conf) {
>>>   conf.set("mapreduce.map.java.opts", "-Xmx3900m");
>>>   conf.set("mapreduce.reduce.java.opts", "-Xmx3900m");
>>>
>>>   conf.set("mapreduce.map.memory.mb", "4096");
>>>   conf.set("mapreduce.reduce.memory.mb", "4096");
>>> }
>>>
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>
>

Mime
View raw message