hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Brown <de...@media6degrees.com>
Subject Fair Scheduler config issues
Date Thu, 03 Dec 2009 03:46:31 GMT
I'm using Cloudera's distribution of 0.20.1, but this seems like a general
question to I'm posting here.

I'm having some issues getting the Fair Scheduler setup. I followed the
basic instructions, from
http://hadoop.apache.org/common/docs/current/fair_scheduler.html:

* Added to mapred-site.xml:

  <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
  </property>

  <property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/etc/hadoop/conf/fairscheduler.xml</value>
  </property>

The fair scheduler jar was already in the installation's root lib/

* Added the basic fairscheduler.xml, based on the example in the docs.

  <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>${pool.name}</value>
    <description>...</description>
  </property>

  <property>
    <name>pool.name</name>
    <value>${user.name}</value>
    <description>...</description>
  </property>

Running a job (say, one of the examples, such as the pi estimator, word
count, or sleep) and check myhost:50030/scheduler, I see the job listed in
the Pools table in the "hadoop" row, since that's the user. That makes
sense. In the Running Jobs table, the dropdown in the Pool column sometimes
shows "hadoop" and sometimes "default" when I reload the page, which is odd.

Then if I change the xml's pool.name entry's value to a hardcoded value, say
"foo", with a matching "foo" <pool> entry in the xml, and run a job (and
restart the JobTracker to be safe), I do see a "foo" row in the Pools table,
but it shows 0 Running Jobs, and "default" shows the one job. Also, the Pool
listed in the dropdown in the Running Jobs table remains "default", rather
than "foo" (although "foo" is a choice, and I CAN select it to change the
pool).

I'd expect that if I set the pool.name in fairscheduler.xml that jobs would
run, and appear, under that pool. Am I missing something in my setup or in
my understanding of how this should work? Thanks for any insight. What I'd
like to be able to do is set the pool name on the command line when running
a job, with an arg of "-Dpool.name=bar".

Thanks,
Derek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message