crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gauci (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-320) Materialize several PObject & PCollection objects in parallel (deferred materialization)
Date Tue, 07 Jan 2014 21:16:50 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864697#comment-13864697
] 

Jason Gauci commented on CRUNCH-320:
------------------------------------

When a pcollection is materialized, is the pcollection stored in RAM?  In our case, the size
of the pcollection is prohibitively large, but if materialize() relies on the disk, this approach
may be possible.

If I apply your patch, it will resolve all pobjects during pipeline.run()?  That would be
all I need to get around this issue.

> Materialize several PObject & PCollection objects in parallel (deferred materialization)
> ----------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-320
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-320
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jason Gauci
>            Assignee: Josh Wills
>         Attachments: CRUNCH-320.patch
>
>
> Currently, Crunch blocks and materializes PCollections (through foo.materialize()) and
PObjects (through foo.getValue()) on demand, but it would be a significant performance improvement
if we could mark several of these objects as to be materialized, and then materialize all
of them in parallel as part of a pipeline.run() call.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message