Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 93D6F97B6 for ; Wed, 29 Feb 2012 21:04:08 +0000 (UTC) Received: (qmail 42378 invoked by uid 500); 29 Feb 2012 21:04:05 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 42329 invoked by uid 500); 29 Feb 2012 21:04:05 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 42321 invoked by uid 99); 29 Feb 2012 21:04:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 21:04:05 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of charles.cearl@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Feb 2012 21:03:59 +0000 Received: by ghbz10 with SMTP id z10so2958678ghb.35 for ; Wed, 29 Feb 2012 13:03:38 -0800 (PST) Received-SPF: pass (google.com: domain of charles.cearl@gmail.com designates 10.236.154.137 as permitted sender) client-ip=10.236.154.137; Authentication-Results: mr.google.com; spf=pass (google.com: domain of charles.cearl@gmail.com designates 10.236.154.137 as permitted sender) smtp.mail=charles.cearl@gmail.com; dkim=pass header.i=charles.cearl@gmail.com Received: from mr.google.com ([10.236.154.137]) by 10.236.154.137 with SMTP id h9mr2981370yhk.91.1330549418567 (num_hops = 1); Wed, 29 Feb 2012 13:03:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=b5muhjZVRQ7kdBEO+tuBwbYpHQQXL8/DN+KXvKc7XBw=; b=ZpQ+Aumd4XVKEPP1p26kupPIr0iFuhhv2OMPqBIJPG8DgWFiS5Wb9q1Q736Aw+0C7e mj+LlCYkd6SiFhC/Cir30xAWFXxQqqqMgSA7GSdHysS1aBKcq78lRSuZIV4WH6paaENS 31L9puzw8MQorILyAoOpaK5Xq7EqzZmWvqBr8= Received: by 10.236.154.137 with SMTP id h9mr2355751yhk.91.1330549418517; Wed, 29 Feb 2012 13:03:38 -0800 (PST) Received: from [192.168.1.71] (99-104-32-161.lightspeed.tukrga.sbcglobal.net. [99.104.32.161]) by mx.google.com with ESMTPS id n12sm58849942yhe.10.2012.02.29.13.03.37 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 29 Feb 2012 13:03:38 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) Subject: Re: Streaming Hadoop using C From: Charles Earl In-Reply-To: Date: Wed, 29 Feb 2012 16:03:36 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <36F15362-7A85-4B3F-AD74-7A9D7B30EA7A@gmail.com> <6F1D3F85-782C-4EF4-A02E-D439A07E3668@gmail.com> To: common-user@hadoop.apache.org X-Mailer: Apple Mail (2.1257) X-Virus-Checked: Checked by ClamAV on apache.org The documentation on Starfish http://www.cs.duke.edu/starfish/index.html looks promising , I have not used it. I wonder if others on the list = have found it more useful than setting mapred.task.profile. C On Feb 29, 2012, at 3:53 PM, Mark question wrote: > I've used hadoop profiling (.prof) to show the stack trace but it was = hard > to follow. jConsole locally since I couldn't find a way to set a port > number to child processes when running them remotely. Linux commands > (top,/proc), showed me that the virtual memory is almost twice as my > physical which means swapping is happening which is what I'm trying to > avoid. >=20 > So basically, is there a way to assign a port to child processes to = monitor > them remotely (asked before by Xun) or would you recommend another > monitoring tool? >=20 > Thank you, > Mark >=20 >=20 > On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl = wrote: >=20 >> Mark, >> So if I understand, it is more the memory management that you are >> interested in, rather than a need to run an existing C or C++ = application >> in MapReduce platform? >> Have you done profiling of the application? >> C >> On Feb 29, 2012, at 2:19 PM, Mark question wrote: >>=20 >>> Thanks Charles .. I'm running Hadoop for research to perform = duplicate >>> detection methods. To go deeper, I need to understand what's slowing = my >>> program, which usually starts with analyzing memory to predict best = input >>> size for map task. So you're saying piping can help me control = memory >> even >>> though it's running on VM eventually? >>>=20 >>> Thanks, >>> Mark >>>=20 >>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl = >> wrote: >>>=20 >>>> Mark, >>>> Both streaming and pipes allow this, perhaps more so pipes at the = level >> of >>>> the mapreduce task. Can you provide more details on the = application? >>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote: >>>>=20 >>>>> Hi guys, thought I should ask this before I use it ... will using = C >> over >>>>> Hadoop give me the usual C memory management? For example, = malloc() , >>>>> sizeof() ? My guess is no since this all will eventually be turned = into >>>>> bytecode, but I need more control on memory which obviously is = hard for >>>> me >>>>> to do with Java. >>>>>=20 >>>>> Let me know of any advantages you know about streaming in C over >> hadoop. >>>>> Thank you, >>>>> Mark >>>>=20 >>>>=20 >>=20 >>=20