crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-269) Allow clients to disable deep copies on intermediate DoFn outputs
Date Sat, 21 Sep 2013 14:43:51 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773828#comment-13773828
] 

Gabriel Reid commented on CRUNCH-269:
-------------------------------------

Any idea on the actual slowdown caused by the deep copying? And if it's specific to pipelines
that are operating on large objects? 

The reason I ask is I was wondering if it would be worth disabling deep copying by default.
The deep copying is only needed if people are modifying objects in place and then passing
them through, which is probably not such a great idea in general anyway (even if it is at
times very useful). If there's a big performance hit on it (I've never profiled it to find
out), then we might want to not do it by default when it usually isn't needed. Of course,
that change could potentially break some stuff in existing pipelines.

+1 on the patch BTW.
                
> Allow clients to disable deep copies on intermediate DoFn outputs
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-269
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-269
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-269.patch
>
>
> I have a pipeline that operates on some large objects, and the additional overhead of
creating a deep copy of them on intermediate outputs (i.e., DoFns w/more than one child operation)
when I know that all of their consumers are going to be read-only is slowing down my runtime
quite a bit. I'd like to have an option that would allow me to disable intermediate deep copies
on a DoFn-by-DoFn basis and/or across an entire pipeline run when I know that it's safe to
do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message