Return-Path: X-Original-To: apmail-incubator-crunch-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-crunch-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98CF3CADB for ; Thu, 28 Jun 2012 14:50:03 +0000 (UTC) Received: (qmail 58788 invoked by uid 500); 28 Jun 2012 14:50:03 -0000 Delivered-To: apmail-incubator-crunch-commits-archive@incubator.apache.org Received: (qmail 58756 invoked by uid 500); 28 Jun 2012 14:50:03 -0000 Mailing-List: contact crunch-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: crunch-dev@incubator.apache.org Delivered-To: mailing list crunch-commits@incubator.apache.org Received: (qmail 58747 invoked by uid 99); 28 Jun 2012 14:50:03 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jun 2012 14:50:03 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 1109A8451; Thu, 28 Jun 2012 14:50:03 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: tzolov@apache.org To: crunch-commits@incubator.apache.org X-Mailer: ASF-Git Admin Mailer Subject: git commit: Document how the Hadoop Reducer implementation impact the DoFn#process() semantics Message-Id: <20120628145003.1109A8451@tyr.zones.apache.org> Date: Thu, 28 Jun 2012 14:50:03 +0000 (UTC) Updated Branches: refs/heads/master 25f328044 -> 1d020be25 Document how the Hadoop Reducer implementation impact the DoFn#process() semantics Project: http://git-wip-us.apache.org/repos/asf/incubator-crunch/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-crunch/commit/1d020be2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-crunch/tree/1d020be2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-crunch/diff/1d020be2 Branch: refs/heads/master Commit: 1d020be259b6b23b1a5ebd0613637f50bf291dc2 Parents: 25f3280 Author: Christian Tzolov Authored: Thu Jun 28 16:47:14 2012 +0200 Committer: Christian Tzolov Committed: Thu Jun 28 16:47:14 2012 +0200 ---------------------------------------------------------------------- src/main/java/com/cloudera/crunch/DoFn.java | 14 ++++++++++++-- 1 files changed, 12 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-crunch/blob/1d020be2/src/main/java/com/cloudera/crunch/DoFn.java ---------------------------------------------------------------------- diff --git a/src/main/java/com/cloudera/crunch/DoFn.java b/src/main/java/com/cloudera/crunch/DoFn.java index b45f6aa..b1bbb73 100644 --- a/src/main/java/com/cloudera/crunch/DoFn.java +++ b/src/main/java/com/cloudera/crunch/DoFn.java @@ -52,8 +52,18 @@ public abstract class DoFn implements Serializable { /** * Processes the records from a {@link PCollection}. * - * @param input The input record - * @param emitter The emitter to send the output to + *
+ *
+ * Note: Crunch can reuse a single input record object whose content + * changes on each {@link #process(Object, Emitter)} method call. This + * functionality is imposed by Hadoop's Reducer implementation: + * The framework will reuse the key and value objects that are passed into the reduce, therefore the application + * should clone the objects they want to keep a copy of. + * + * @param input + * The input record. + * @param emitter + * The emitter to send the output to */ public abstract void process(S input, Emitter emitter);