Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2A6FB101C3 for ; Fri, 5 Apr 2013 19:40:32 +0000 (UTC) Received: (qmail 23877 invoked by uid 500); 5 Apr 2013 19:40:32 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 23846 invoked by uid 500); 5 Apr 2013 19:40:32 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 23836 invoked by uid 99); 5 Apr 2013 19:40:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 19:40:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gabriel.reid@gmail.com designates 209.85.215.179 as permitted sender) Received: from [209.85.215.179] (HELO mail-ea0-f179.google.com) (209.85.215.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 19:40:24 +0000 Received: by mail-ea0-f179.google.com with SMTP id f15so1468881eak.24 for ; Fri, 05 Apr 2013 12:40:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=xY0TcsrhOcjAJSpbHswKFRUtBo6dcpmLFrnjIhE+nrI=; b=kgXxC4MSGf1oXfSykFLaH65bLgQ0ksAlK+Fh+gmxz0zmqkaR3AlPGMcil2WrIzE1el zNCxPPIUJfvr2jGXTSAPUGtCX9fqp7ph0rmg+rrAnX/jFR92gE4Sc15l4DgGFHP3nWl6 OZIHR+1G5fnW8gtGFauNkmb+y3+HWOqLcqrhNgESw83960M4n/EkYNmkr1IH+aCtnyX1 LCfKbredLX9jKDJWoQbRZ+C/rT7kpfCyx58me7YmS1JA9nY/KIgXW9j1zi/ZrQiD5i2v A96Sz5BlyH8v7jAVPKA0kjEx/V8BjF60rdPssmK72p7OtJhI8hQxJiuk4IIqPSIPd8fp Gp0Q== X-Received: by 10.15.101.200 with SMTP id bp48mr22159417eeb.38.1365190804110; Fri, 05 Apr 2013 12:40:04 -0700 (PDT) Received: from [192.168.0.108] (78-22-137-231.access.telenet.be. [78.22.137.231]) by mx.google.com with ESMTPS id t4sm17261484eel.0.2013.04.05.12.40.02 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 05 Apr 2013 12:40:02 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: Concerning the use of the Iterable parameter to CombineFn From: Gabriel Reid In-Reply-To: Date: Fri, 5 Apr 2013 21:40:03 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: user@crunch.apache.org X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on apache.org Hi Chad, Good point -- I know that this has tripped people up in the past. I = think that definitely documenting this and possibly enforcing it sounds = like a good idea -- I've logged a ticket in JIRA (with the content of = your mail), see https://issues.apache.org/jira/browse/CRUNCH-192 - Gabriel On 05 Apr 2013, at 21:30, Chad Urso McDaniel wrote: > BLUF: The Iterable parameter to CombineFn.process implies you can = iterate multiple times when you cannot and this leads to surprising = behavior. >=20 > As many of you probably know, the signature of CombineFn.process is=20 > --- > process(Pair> input, Emitter> emitter) > --- >=20 > The corresponding Hadoop Reducer signature is > --- > reduce(K2 key, Iterator values, OutputCollector output, = Reporter reporter) > --- >=20 > I assume the Crunch use of Iterable is for convenient use in "for" = loops. >=20 > Unfortunately, the behavior of this Iterable seems to return the same = Iterator object each time Iterable.iterator() is called.=20 >=20 > This makes sense to me based on the underlying hadoop mapreduce, but = violates what I think most expect from the Iterable interface. >=20 > I understand that it's too late to change the interface, but could we = at least have an javadoc or an exception thrown if the Iterable is used = more than once?