crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-479) Writing to target with WriteMode.APPEND merges values into PCollection
Date Mon, 27 Oct 2014 15:39:34 GMT


Micah Whitacre updated CRUNCH-479:
    Attachment: CRUNCH-479.patch

Here is a test showing the behavior...

I believe the issue is that when we "write" the "materializedAt" value is set on the PCollectionImpl[1]
instance that points to the directory.  So it therefore reads all of the existing values.
 Not sure of the best way to fix this yet.

[1] -

> Writing to target with WriteMode.APPEND merges values into PCollection
> ----------------------------------------------------------------------
>                 Key: CRUNCH-479
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>         Attachments: CRUNCH-479.patch
> This was mentioned as part of CDK-617[1].  A PCollection that contains a set of values,
is written to a target with WriteMode.APPEND, and then that PCollection is materialized, when
you iterate over that PCollection it contains not only the new values that were appended but
also the existing values.  This is surprising as most would expect that collection to only
contain the original collection of values.  A use case for this might be if the solution is
looking to only process the new values instead of dealing with all of the existing data.
> [1] -

This message was sent by Atlassian JIRA

View raw message