crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Plan Dotfile in Configuration
Date Thu, 02 Jul 2015 05:26:16 GMT
>From what I remember, the original intent of storing the dotfile in the
Configuration object was:
* to have a simple way of making it available to clients without putting
anything dotfile-specific in an API (because dotfiles are only available
for MR pipelines)
* making it available as soon as MRPipeline.plan is called (which rules out
putting it in the PipelineResult object)
* also making it easy to get at after a full pipeline had been run

Looking at how things are now since CRUNCH-418 and CRUNCH-438, I think that
most of the above points are no longer valid. The dotfile can be retrieved
via the MRExecutor that is returned from MRPipeline.plan, and can be
automatically written to an output directory. The only reason to keep it
around in the Configuration object is for backwards compatibility.

What I would propose is that we deprecate
PlanningParameters#PIPELINE_PLAN_DOTFILE, and remove it in an upcoming
release. That means we probably still need to work around the issue that
Brian is encountering though.

@Brian, was the code throwing the exception your own code, or is there a
hard limit in the Configuration class somewhere? My initial thought is that
we could skip adding the dotfile to the Configuration that is serialized,
and only add it when we return the Configuration from
MRPipeline.getConfiguration.

- Gabriel



On Thu, Jul 2, 2015 at 12:05 AM Christian Tzolov <christian.tzolov@gmail.com>
wrote:

> Hi Bryan, Josh,
>
> IIRC this comes from the original dotfile jobplan implementation. I kept
> it for backward compatible. You can see that only the "jobplan" (e.g. the
> original/main plan) is stored in the Configuration.
>
> +Gabriel i am not sure I remember the original intent to have the jobplan
> stored in the Configuration?
>
>
>
>
>
> On Wed, Jul 1, 2015 at 11:02 PM, Josh Wills <josh.wills@gmail.com> wrote:
>
>> +Christian
>>
>> I'm not sure what the intent was there-- Christian?
>>
>> J
>>
>> On Wed, Jul 1, 2015 at 12:29 PM, Bryan Baugher <bjbq4d@gmail.com> wrote:
>>
>>> We recently ran into an issue where our code to serialize a pipeline's
>>> configuration was throwing an exception because one of the key/values in
>>> the config was too big (65k characters). We found this key/value was
>>> 'crunch.planner.dotfile' which is included in the pipelines config from
>>> Crunch.
>>>
>>> My question is why does Crunch provide this value into the config
>>> object?
>>>
>>> Crunch saves the dotfile string in the MRExecutor context[1] and I don't
>>> think any pipeline would need this at runtime. It also seems like there are
>>> no references to this config value anywhere within Crunch other then to
>>> write the value into the config object.
>>>
>>> [1] -
>>> https://github.com/apache/crunch/blob/d176778cf803374506cb7743069a05e28e07e2cf/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/DotfileUtills.java#L139-L140
>>>
>>>
>>
>

Mime
View raw message