Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of jieli@cs.duke.edu designates
 152.3.140.1 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+v5OK+Mm_Kxb-Vf92v4LZ-XQEFxqm=XL_ccwK=3hvMB=nRk9A@mail.gmail.com>
References: 
 <CA+v5OKJN_YhJ4mhxKme9XnxHf+_L7Wi-u9+Kw6m++zErErX30w@mail.gmail.com>
	<CALY2=u4XRGWR0x3ZJ4NwnOCCw4AxNscrBPBQT73HpNG+MRtXcA@mail.gmail.com>
	<CA+v5OK+Mm_Kxb-Vf92v4LZ-XQEFxqm=XL_ccwK=3hvMB=nRk9A@mail.gmail.com>
Date: Wed, 7 Mar 2012 15:47:05 -0500
Message-ID: 
 <CALY2=u6QMHq1SKtk5qHDOkdRPUqXhMVw5NmxQ0aWAv6efgfwRQ@mail.gmail.com>
Subject: Re: Profiling Hadoop Job
From: Jie Li <jieli@cs.duke.edu>
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae93403cf2daf2a04baad43da

--14dae93403cf2daf2a04baad43da
Content-Type: text/plain; charset=ISO-8859-1

Hi Leo,

Thanks for pointing out the outdated README file.  Glad to tell you that we
do support the old API in the latest version. See here:

http://www.cs.duke.edu/starfish/previous.html

Welcome to join our mailing list and your questions will reach more of our
group members.

Jie

On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <lurbina@mit.edu> wrote:

> Hi Jie,
>
> According to the Starfish README, the hadoop programs must be written using
> the new Hadoop API. This is not my case (I am using MultipleInputs among
> other non-new API supported features). Is there any way around this?
> Thanks,
>
> -Leo
>
> On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <jieli@cs.duke.edu> wrote:
>
> > Hi Leonardo,
> >
> > You might want to try Starfish which supports the memory profiling as
> well
> > as cpu/disk/network profiling for the performance tuning.
> >
> > Jie
> > ------------------
> > Starfish is an intelligent performance tuning tool for Hadoop.
> > Homepage: www.cs.duke.edu/starfish/
> > Mailing list: http://groups.google.com/group/hadoop-starfish
> >
> >
> > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <lurbina@mit.edu> wrote:
> >
> > > Hello everyone,
> > >
> > > I have a Hadoop job that I run on several GBs of data that I am trying
> to
> > > optimize in order to reduce the memory consumption as well as improve
> the
> > > speed. I am following the steps outlined in Tom White's "Hadoop: The
> > > Definitive Guide" for profiling using HPROF (p161), by setting the
> > > following properties in the JobConf:
> > >
> > >        job.setProfileEnabled(true);
> > >
> > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6,"
> +
> > >                "force=n,thread=y,verbose=n,file=%s");
> > >        job.setProfileTaskRange(true, "0-2");
> > >        job.setProfileTaskRange(false, "0-2");
> > >
> > > I am trying to run this locally on a single pseudo-distributed install
> of
> > > hadoop (0.20.2) and it gives the following error:
> > >
> > > Exception in thread "main" java.io.FileNotFoundException:
> > > attempt_201203071311_0004_m_000000_0.profile (Permission denied)
> > >        at java.io.FileOutputStream.open(Native Method)
> > >        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> > >        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
> > >        at
> > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
> > >        at
> > >
> >
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
> > >        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
> > >        at
> > >
> > >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >        at
> > >
> > >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >        at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >        at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >
> > > However, I can access these logs directly from the tasktracker's logs
> > > (through the web UI). For the sakes of  running this locally, I could
> > just
> > > ignore this error, however I want to be able to profile the job once
> > > deployed to our hadoop cluster and need to be able to automatically
> > > retrieve these logs. Do I need to change the permissions in HDFS to
> allow
> > > for this? Any ideas on how to fix this? Thanks in advance,
> > >
> > > Best,
> > > -Leo
> > >
> > > --
> > > Leo Urbina
> > > Massachusetts Institute of Technology
> > > Department of Electrical Engineering and Computer Science
> > > Department of Mathematics
> > > lurbina@mit.edu
> > >
> >
>
>
>
> --
> Leo Urbina
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Department of Mathematics
> lurbina@mit.edu
>

--14dae93403cf2daf2a04baad43da--