Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of charles.cearl@gmail.com
 designates 209.85.160.176 as permitted sender)
Received-SPF: pass (google.com: domain of charles.cearl@gmail.com designates
 10.236.154.137 as permitted sender) client-ip=10.236.154.137;
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: Streaming Hadoop using C
From: Charles Earl <charles.cearl@gmail.com>
In-Reply-To: 
 <CALR_EG+5qdeo2DDC8A3uq9MM3XKSWZh1aYnCzyq1qrxd16_SQA@mail.gmail.com>
Date: Wed, 29 Feb 2012 16:03:36 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <FDAAAB7C-7237-43B8-A202-3C437C9D160A@gmail.com>
References: 
 <CALR_EGL3tbs-=5i9gWuF3jDq75bu6nvNn3y8wSu44Ok_T57EQw@mail.gmail.com>
 <36F15362-7A85-4B3F-AD74-7A9D7B30EA7A@gmail.com>
 <CALR_EGKL1iAauSRkyYa0jowMZcs_Po2wgECPOcYs+RRCQDAinQ@mail.gmail.com>
 <6F1D3F85-782C-4EF4-A02E-D439A07E3668@gmail.com>
 <CALR_EG+5qdeo2DDC8A3uq9MM3XKSWZh1aYnCzyq1qrxd16_SQA@mail.gmail.com>
To: common-user@hadoop.apache.org

The documentation on Starfish http://www.cs.duke.edu/starfish/index.html
looks promising , I have not used it. I wonder if others on the list =
have found it more useful than setting mapred.task.profile.
C
On Feb 29, 2012, at 3:53 PM, Mark question wrote:

> I've used hadoop profiling (.prof) to show the stack trace but it was =
hard
> to follow. jConsole locally since I couldn't find a way to set a port
> number to child processes when running them remotely. Linux commands
> (top,/proc), showed me that the virtual memory is almost twice as my
> physical which means swapping is happening which is what I'm trying to
> avoid.
>=20
> So basically, is there a way to assign a port to child processes to =
monitor
> them remotely (asked before by Xun) or would you recommend another
> monitoring tool?
>=20
> Thank you,
> Mark
>=20
>=20
> On Wed, Feb 29, 2012 at 11:35 AM, Charles Earl =
<charles.cearl@gmail.com>wrote:
>=20
>> Mark,
>> So if I understand, it is more the memory management that you are
>> interested in, rather than a need to run an existing C or C++ =
application
>> in MapReduce platform?
>> Have you done profiling of the application?
>> C
>> On Feb 29, 2012, at 2:19 PM, Mark question wrote:
>>=20
>>> Thanks Charles .. I'm running Hadoop for research to perform =
duplicate
>>> detection methods. To go deeper, I need to understand what's slowing =
my
>>> program, which usually starts with analyzing memory to predict best =
input
>>> size for map task. So you're saying piping can help me control =
memory
>> even
>>> though it's running on VM eventually?
>>>=20
>>> Thanks,
>>> Mark
>>>=20
>>> On Wed, Feb 29, 2012 at 11:03 AM, Charles Earl =
<charles.cearl@gmail.com
>>> wrote:
>>>=20
>>>> Mark,
>>>> Both streaming and pipes allow this, perhaps more so pipes at the =
level
>> of
>>>> the mapreduce task. Can you provide more details on the =
application?
>>>> On Feb 29, 2012, at 1:56 PM, Mark question wrote:
>>>>=20
>>>>> Hi guys, thought I should ask this before I use it ... will using =
C
>> over
>>>>> Hadoop give me the usual C memory management? For example, =
malloc() ,
>>>>> sizeof() ? My guess is no since this all will eventually be turned =
into
>>>>> bytecode, but I need more control on memory which obviously is =
hard for
>>>> me
>>>>> to do with Java.
>>>>>=20
>>>>> Let me know of any advantages you know about streaming in C over
>> hadoop.
>>>>> Thank you,
>>>>> Mark
>>>>=20
>>>>=20
>>=20
>>=20