hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Laurent <organicveg...@gmail.com>
Subject Re: Jobs run slower and slower
Date Tue, 03 Mar 2009 22:02:16 GMT
Hrmmm. According to hadoop-defaults.xml,
mapred.jobtracker.completeuserjobs.maximum defaults to 100. So I tried
setting it to 1, but that had no effect. I still see each successive run
taking longer than the previous run.

1) Restart M/R
2) Run #1: 142.12 (secs)
3) Run #2 181.96 (secs)
4) Run #3  221.95 (secs)
5) Run #4  281.96 (secs)

I don't think that's the problem here... :(

-S

On Tue, Mar 3, 2009 at 2:33 PM, Runping Qi <runping.qi@gmail.com> wrote:

> The jobtracker's memory increased as you ran more and more jobs because the
> job tracker still kept some data about those completed jobs. The maximum
> number of completed jobs kept is determined by the config variable
> mapred.jobtracker.completeuserjobs.maximum.
> You can lower that to lower the job tracker memory consumption.
>
>
> On Tue, Mar 3, 2009 at 10:01 AM, Sean Laurent <organicveggie@gmail.com
> >wrote:
>
> > Interesting... from reading HADOOP-4766, I'm  not entirely clear if that
> > problem is related to the number of jobs or the number of tasks.
> >
> > - I'm only running a single job with approximately 900 map tasks as
> opposed
> > to the 500-600+ jobs and 100K tasks described in HADOOP-4766.
> > - I am seeing increased memory use on the JobTracker.
> > - I am seeing elevated memory use over time on the DataNode/TaskTracker
> > machines.
> > - Amar's description in HADOOP-4766 from December 6th sounds pretty
> > similar.
> >
> > I also tried adjusting garbage collection via -XX:+UseParallelGC, but
> that
> > had no noticeable impact.
> >
> > It also wasn't clear to me what, if anything, I can do to fix or work
> > around
> > the problem.
> >
> > Any advice would be greatly appreciated.
> >
> > -Sean
> > - Show quoted text -
> >
> > On Mon, Mar 2, 2009 at 7:50 PM, Runping Qi <runping.qi@gmail.com> wrote:
> >
> > > Your problem may be related to
> > > https://issues.apache.org/jira/browse/HADOOP-4766
> > >
> > > Runping
> > >
> > >
> > > On Mon, Mar 2, 2009 at 4:46 PM, Sean Laurent <organicveggie@gmail.com
> > > >wrote:
> > >
> > > > Hi all,
> > > > I'm conducting some initial tests with Hadoop to better understand
> how
> > > well
> > > > it will handle and scale with some of our specific problems. As a
> > result,
> > > > I've written some M/R jobs that are representative of the work we
> want
> > to
> > > > do. I then run the jobs multiple times in a row (sequentially) to get
> a
> > > > rough estimate for average run-time.
> > > >
> > > > What I'm seeing is really strange... If I run the same job with the
> > same
> > > > inputs multiple times, each successive run is slower than the
> previous
> > > run.
> > > > If I restart the cluster and re-run the tests, the first run is fast
> > and
> > > > then each successive run is slower.
> > > >
> > > > For example, I just started the cluster and ran the same job 4 times.
> > The
> > > > run times for the jobs were as follows: 127 seconds, 177 seconds, 207
> > > > seconds and 218 seconds. I restarted HDFS and M/R, reran the job 3
> more
> > > > times and got the following run times: 138 seconds, 187 seconds and
> 221
> > > > seconds. :(
> > > >
> > > > The map task is pretty simple - parse XML files to extract specific
> > > > elements. I'm using Cascading and wrote a custom Scheme, which in
> turn
> > > uses
> > > > a custom FileInputFormat that treats each file as an entire record
> > > > (splitable = false). Each file is then treated as a separate map task
> > > with
> > > > no reduce step.
> > > >
> > > > In this case I have a 8 node cluster. 1 node acts as a dedicated
> > > > NameNode/JobTracker and 7 nodes run the DataNode/TaskTracker. Each
> > > machine
> > > > is identical: Dell 1950 with Intel quad-core 2.5, 8GB RAM and 2 250GB
> > > SATA2
> > > > drives. All 8 machines are in the same rack running on a dedicated
> > > Force10
> > > > gigabit switch.
> > > >
> > > > I tried enabling JVM reuse via JobConf, which improved performance
> for
> > > the
> > > > initial few runs... but each successive job still took longer than
> the
> > > > previous. I also tried increasing the maximum memory via the
> > > > mapred.child.java.opts property, but that didn't have any impact.
> > > >
> > > > I checked the logs, but I don't see any errors.
> > > >
> > > > Here's my basic list of configured properties:
> > > >
> > > > fs.default.name=hdfs://dn01.hadoop.mycompany.com:9000
> > > > mapred.job.tracker=dn01.hadoop.mycompany.com:9001
> > > > dfs.replication=3
> > > > dfs.block.size=1048576
> > > > dfs.name.dir=/opt/hadoop/volume1/name,/opt/hadoop/volume2/name
> > > > dfs.data.dir=/opt/hadoop/volume1/data,/opt/hadoop/volume2/data
> > > >
> mapred.local.dir=/opt/hadoop/volume1/mapred,/opt/hadoop/volume2/mapred
> > > > mapred.child.java.opts=-Xmx1532m
> > > >
> > > > Frankly I'm stumped. I'm sure there is something obvious that I'm
> > > missing,
> > > > but I'm totally at a loss right now. Any suggestions would be
> ~greatly~
> > > > appreciated.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message