Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 93948 invoked from network); 11 Feb 2008 20:25:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Feb 2008 20:25:25 -0000 Received: (qmail 20483 invoked by uid 500); 11 Feb 2008 20:25:15 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 20458 invoked by uid 500); 11 Feb 2008 20:25:15 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 20449 invoked by uid 99); 11 Feb 2008 20:25:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2008 12:25:15 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 204.16.106.233 is neither permitted nor denied by domain of jeastman@collab.net) Received: from [204.16.106.233] (HELO sp-exchmbc.sp.corp.collab.net) (204.16.106.233) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2008 20:24:31 +0000 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Best Practice? Date: Mon, 11 Feb 2008 12:24:50 -0800 Message-ID: In-Reply-To: <653CA365-A5C8-4DF0-A0BE-297DB16BE69A@yahoo-inc.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Best Practice? Thread-Index: Achs3aSAXYPdyRTGThqr1kx5jr5wKwADhAfQ References: <47AE3A85.7010801@cs.washington.edu> <653CA365-A5C8-4DF0-A0BE-297DB16BE69A@yahoo-inc.com> From: "Jeff Eastman" To: X-Virus-Checked: Checked by ClamAV on apache.org Hi Owen, Thanks for the information. I took Ted's advice and refactored my mapper so as to use a combiner and that solved my front-end canopy generation problem, but I still have to output the final canopies in the reducer during close() since there is no similar combiner mechanism. I was worried about this, but now I won't. Thanks, Jeff -----Original Message----- From: Owen O'Malley [mailto:oom@yahoo-inc.com]=20 Sent: Monday, February 11, 2008 10:40 AM To: core-user@hadoop.apache.org Subject: Re: Best Practice? On Feb 9, 2008, at 4:21 PM, Jeff Eastman wrote: > I'm trying to wait until close() to output the cluster centroids to =20 > the > reducer, but the OutputCollector is not available. You hit on exactly the right solution. Actually, because of Pipes and =20 Streaming, you have a lot more guarantees than you would expect. In =20 particular, you can call output.collect when the framework is between =20 calls to map or reduce up until the close finishes. -- Owen