hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonardo Urbina <lurb...@mit.edu>
Subject Re: Profiling Hadoop Job
Date Wed, 07 Mar 2012 20:37:19 GMT
Hi Jie,

According to the Starfish README, the hadoop programs must be written using
the new Hadoop API. This is not my case (I am using MultipleInputs among
other non-new API supported features). Is there any way around this? Thanks,

-Leo

On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <jieli@cs.duke.edu> wrote:

> Hi Leonardo,
>
> You might want to try Starfish which supports the memory profiling as well
> as cpu/disk/network profiling for the performance tuning.
>
> Jie
> ------------------
> Starfish is an intelligent performance tuning tool for Hadoop.
> Homepage: www.cs.duke.edu/starfish/
> Mailing list: http://groups.google.com/group/hadoop-starfish
>
>
> On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <lurbina@mit.edu> wrote:
>
> > Hello everyone,
> >
> > I have a Hadoop job that I run on several GBs of data that I am trying to
> > optimize in order to reduce the memory consumption as well as improve the
> > speed. I am following the steps outlined in Tom White's "Hadoop: The
> > Definitive Guide" for profiling using HPROF (p161), by setting the
> > following properties in the JobConf:
> >
> >        job.setProfileEnabled(true);
> >
> > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
> >                "force=n,thread=y,verbose=n,file=%s");
> >        job.setProfileTaskRange(true, "0-2");
> >        job.setProfileTaskRange(false, "0-2");
> >
> > I am trying to run this locally on a single pseudo-distributed install of
> > hadoop (0.20.2) and it gives the following error:
> >
> > Exception in thread "main" java.io.FileNotFoundException:
> > attempt_201203071311_0004_m_000000_0.profile (Permission denied)
> >        at java.io.FileOutputStream.open(Native Method)
> >        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> >        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
> >        at
> > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
> >        at
> >
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
> >        at
> >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at
> >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >        at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > However, I can access these logs directly from the tasktracker's logs
> > (through the web UI). For the sakes of  running this locally, I could
> just
> > ignore this error, however I want to be able to profile the job once
> > deployed to our hadoop cluster and need to be able to automatically
> > retrieve these logs. Do I need to change the permissions in HDFS to allow
> > for this? Any ideas on how to fix this? Thanks in advance,
> >
> > Best,
> > -Leo
> >
> > --
> > Leo Urbina
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Department of Mathematics
> > lurbina@mit.edu
> >
>



-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurbina@mit.edu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message