crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Plan Dotfile in Configuration
Date Thu, 02 Jul 2015 14:31:57 GMT
Ok, thanks for the info Bryan.

I'll log a jira ticket for making the change that I mentioned in my
previous mail.

- Gabriel


On Thu, Jul 2, 2015 at 4:13 PM Bryan Baugher <bjbq4d@gmail.com> wrote:

> The code throwing the exception was our code. We can get around it for
> now. Just thought it was strange to see such a large config value.
>
> On Thu, Jul 2, 2015 at 12:26 AM Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>
>> From what I remember, the original intent of storing the dotfile in the
>> Configuration object was:
>> * to have a simple way of making it available to clients without putting
>> anything dotfile-specific in an API (because dotfiles are only available
>> for MR pipelines)
>> * making it available as soon as MRPipeline.plan is called (which rules
>> out putting it in the PipelineResult object)
>> * also making it easy to get at after a full pipeline had been run
>>
>> Looking at how things are now since CRUNCH-418 and CRUNCH-438, I think
>> that most of the above points are no longer valid. The dotfile can be
>> retrieved via the MRExecutor that is returned from MRPipeline.plan, and can
>> be automatically written to an output directory. The only reason to keep it
>> around in the Configuration object is for backwards compatibility.
>>
>> What I would propose is that we deprecate
>> PlanningParameters#PIPELINE_PLAN_DOTFILE, and remove it in an upcoming
>> release. That means we probably still need to work around the issue that
>> Brian is encountering though.
>>
>> @Brian, was the code throwing the exception your own code, or is there a
>> hard limit in the Configuration class somewhere? My initial thought is that
>> we could skip adding the dotfile to the Configuration that is serialized,
>> and only add it when we return the Configuration from
>> MRPipeline.getConfiguration.
>>
>> - Gabriel
>>
>>
>>
>> On Thu, Jul 2, 2015 at 12:05 AM Christian Tzolov <
>> christian.tzolov@gmail.com> wrote:
>>
>>> Hi Bryan, Josh,
>>>
>>> IIRC this comes from the original dotfile jobplan implementation. I kept
>>> it for backward compatible. You can see that only the "jobplan" (e.g. the
>>> original/main plan) is stored in the Configuration.
>>>
>>> +Gabriel i am not sure I remember the original intent to have the
>>> jobplan stored in the Configuration?
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Jul 1, 2015 at 11:02 PM, Josh Wills <josh.wills@gmail.com>
>>> wrote:
>>>
>>>> +Christian
>>>>
>>>> I'm not sure what the intent was there-- Christian?
>>>>
>>>> J
>>>>
>>>> On Wed, Jul 1, 2015 at 12:29 PM, Bryan Baugher <bjbq4d@gmail.com>
>>>> wrote:
>>>>
>>>>> We recently ran into an issue where our code to serialize a pipeline's
>>>>> configuration was throwing an exception because one of the key/values
in
>>>>> the config was too big (65k characters). We found this key/value was
>>>>> 'crunch.planner.dotfile' which is included in the pipelines config from
>>>>> Crunch.
>>>>>
>>>>> My question is why does Crunch provide this value into the config
>>>>> object?
>>>>>
>>>>> Crunch saves the dotfile string in the MRExecutor context[1] and I
>>>>> don't think any pipeline would need this at runtime. It also seems like
>>>>> there are no references to this config value anywhere within Crunch other
>>>>> then to write the value into the config object.
>>>>>
>>>>> [1] -
>>>>> https://github.com/apache/crunch/blob/d176778cf803374506cb7743069a05e28e07e2cf/crunch-core/src/main/java/org/apache/crunch/impl/mr/plan/DotfileUtills.java#L139-L140
>>>>>
>>>>>
>>>>
>>>

Mime
View raw message