Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 11834 invoked from network); 30 Mar 2010 20:44:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Mar 2010 20:44:10 -0000 Received: (qmail 91792 invoked by uid 500); 30 Mar 2010 20:44:08 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 91734 invoked by uid 500); 30 Mar 2010 20:44:08 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 91726 invoked by uid 99); 30 Mar 2010 20:44:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 20:44:08 +0000 X-ASF-Spam-Status: No, hits=-0.9 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [156.148.72.33] (HELO raffaello.crs4.it) (156.148.72.33) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 20:44:01 +0000 Received: from pflip (unknown [156.148.160.116]) by raffaello.crs4.it (Postfix) with SMTP id 643D56BB9F for ; Tue, 30 Mar 2010 22:41:50 +0200 (CEST) Received: by pflip (sSMTP sendmail emulation); Tue, 30 Mar 2010 22:43:37 +0200 Subject: Re: C++ pipes on full (nonpseudo) cluster From: Gianluigi Zanetti Reply-To: gianluigi.zanetti@crs4.it To: common-user@hadoop.apache.org In-Reply-To: <919BE350-EA26-40B0-8651-FC9FE8B46605@keithwiley.com> References: <919BE350-EA26-40B0-8651-FC9FE8B46605@keithwiley.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: CRS4 Date: Tue, 30 Mar 2010 22:43:37 +0200 Message-Id: <1269981817.7630.41.camel@pflip> Mime-Version: 1.0 X-Mailer: Evolution 2.26.3 Hello. Did you try following the tutorial in http://wiki.apache.org/hadoop/C++WordCount ? We use C++ pipes in production on a large cluster, and it works. --gianluigi On Tue, 2010-03-30 at 13:28 -0700, Keith Wiley wrote: > No responses yet, although I admit it's only been a few hours. > > As a follow-up, permit me to pose the following question: > > Is it, in fact, impossible to run C++ pipes on a fully-distributed system (as opposed to a pseudo-distributed system)? I haven't found any definitive clarification on this topic one way or the other. The only statement that I found in the least bit illuminating is in the O'Reilly book (not official Hadoop documentation mind you), p.38, which states: > > "To run a Pipes job, we need to run Hadoop in pseudo-distributed mode...Pipes doesn't run in standalone (local) mode, since it relies on Hadoop's distributed cache mechanism, which works only when HDFS is running." > > The phrasing of those statements is a little unclear in that the distinction being made appears to be between standalone and pseudo-distributed mode, without any specific reference to fully-distributed mode. Namely, the section that qualifies the need for pseudo-distributed mode (the need for HDFS) would obviously also apply to full distributed mode despite the lack of mention of fully distributed mode in the quoted section. So can pipes run in fully distributed mode or not? > > Bottom line, I can't get C++ pipes to work on a fully distributed cluster yet and I don't know if I am wasting my time, if this is a truly impossible effort or if it can be done and I simply haven't figured out how to do it yet. > > Thanks for any help. > > ________________________________________________________________________________ > Keith Wiley kwiley@keithwiley.com www.keithwiley.com > > "The easy confidence with which I know another man's religion is folly teaches > me to suspect that my own is also." > -- Mark Twain > ________________________________________________________________________________ > > >