crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-479) Writing to target with WriteMode.APPEND merges values into PCollection
Date Mon, 27 Oct 2014 20:44:35 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills updated CRUNCH-479:
------------------------------
    Attachment: CRUNCH-479b.patch

[~mkwhitacre] fix for this, along with a few subtle improvements to your test code.

> Writing to target with WriteMode.APPEND merges values into PCollection
> ----------------------------------------------------------------------
>
>                 Key: CRUNCH-479
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-479
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>            Reporter: Micah Whitacre
>            Assignee: Josh Wills
>         Attachments: CRUNCH-479.patch, CRUNCH-479b.patch
>
>
> This was mentioned as part of CDK-617[1].  A PCollection that contains a set of values,
is written to a target with WriteMode.APPEND, and then that PCollection is materialized, when
you iterate over that PCollection it contains not only the new values that were appended but
also the existing values.  This is surprising as most would expect that collection to only
contain the original collection of values.  A use case for this might be if the solution is
looking to only process the new values instead of dealing with all of the existing data.
> [1] - https://issues.cloudera.org/browse/CDK-671



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message