crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Baugher <bjb...@gmail.com>
Subject Re: Plan Dotfile in Configuration
Date Thu, 02 Jul 2015 14:13:13 GMT
The code throwing the exception was our code. We can get around it for now.
Just thought it was strange to see such a large config value.

On Thu, Jul 2, 2015 at 12:26 AM Gabriel Reid <gabriel.reid@gmail.com> wrote:

> From what I remember, the original intent of storing the dotfile in the
> Configuration object was:
> * to have a simple way of making it available to clients without putting
> anything dotfile-specific in an API (because dotfiles are only available
> for MR pipelines)
> * making it available as soon as MRPipeline.plan is called (which rules
> out putting it in the PipelineResult object)
> * also making it easy to get at after a full pipeline had been run
>
> Looking at how things are now since CRUNCH-418 and CRUNCH-438, I think
> that most of the above points are no longer valid. The dotfile can be
> retrieved via the MRExecutor that is returned from MRPipeline.plan, and can
> be automatically written to an output directory. The only reason to keep it
> around in the Configuration object is for backwards compatibility.
>
> What I would propose is that we deprecate
> PlanningParameters#PIPELINE_PLAN_DOTFILE, and remove it in an upcoming
> release. That means we probably still need to work around the issue that
> Brian is encountering though.
>
> @Brian, was the code throwing the exception your own code, or is there a
> hard limit in the Configuration class somewhere? My initial thought is that
> we could skip adding the dotfile to the Configuration that is serialized,
> and only add it when we return the Configuration from
> MRPipeline.getConfiguration.
>
> - Gabriel
>
>
>
> On Thu, Jul 2, 2015 at 12:05 AM Christian Tzolov <
> christian.tzolov@gmail.com> wrote:
>
>> Hi Bryan, Josh,
>>
>> IIRC this comes from the original dotfile jobplan implementation. I kept
>> it for backward compatible. You can see that only the "jobplan" (e.g. the
>> original/main plan) is stored in the Configuration.
>>
>> +Gabriel i am not sure I remember the original intent to have the jobplan
>> stored in the Configuration?
>>
>>
>>
>>
>>
>> On Wed, Jul 1, 2015 at 11:02 PM, Josh Wills <josh.wills@gmail.com> wrote:
>>
>>> +Christian
>>>
>>> I'm not sure what the intent was there-- Christian?
>>>
>>> J
>>>
>>> On Wed, Jul 1, 2015 at 12:29 PM, Bryan Baugher <bjbq4d@gmail.com> wrote:
>>>
>>>> We recently ran into an issue where our code to serialize a pipeline's
>>>> configuration was throwing an exception because one of the key/values in
>>>> the config was too big (65k characters). We found this key/value was
>>>> 'crunch.planner.dotfile' which is included in the pipelines config from
>>>> Crunch.
>>>>
>>>> My question is why does Crunch provide this value into the config
>>>> object?
>>>>
>>>> Crunch saves the dotfile string in the MRExecutor context[1] and I
>>>> don't think any pipeline would need this at runtime. It also seems like
>>>> there are no references to this config value anywhere within Crunch other
>>>> then to write the value into the config object.
>>>>
>>>> [1] -
>>>> https://github.com/apache/crunch/blob/d176778cf803374506cb7743069a05e28e07e2cf/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/DotfileUtills.java#L139-L140
>>>>
>>>>
>>>
>>

Mime
View raw message