hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "IdeasOnLdapConfiguration" by SomeOtherAccount
Date Wed, 12 May 2010 19:29:24 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration?action=diff&rev1=5&rev2=6

--------------------------------------------------

  
  Are we a tasktracker? (&(objectclass=hadoopTaskTracker)(hostname=node1)):  We got an
object back!  Fire up the task tracker with that object's info.
  
- From these base definitions, we can do more complex things:
+ Let's do a more complex example.  What happens if we have more than one type of compute
node?  We just have multiple definitions of our compute nodes: 
  
  {{{
- commonname=simplecomputenode1,cluster=red
+ commonname=computenode1,cluster=red
  objectclass=hadoopDataNode,hadoopTaskTracker
  hostname:  node1,node2,node3
  dfs.data.dir: /hdfs1, /hdfs2, /hdfs3
@@ -125, +125 @@

  mapred.tasktracker.map.tasks.maximum: 4
  mapred.tasktracker.reduce.tasks.maximum: 4
  
- commonname=simplecomputenode2,cluster=red
+ commonname=computenode2,cluster=red
  objectclass=hadoopDataNode,hadoopTaskTracker
  hostname:  node4,node5,node6
  dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4
@@ -136, +136 @@

  mapred.tasktracker.reduce.tasks.maximum: 4
  }}}
  
+ What if we want more than one job tracker talking to the same HDFS?   LDAP makes defining
this easy:
+ 
+ {{{
+ commonname=masternn,cluster=red
+ objectclass=hadoopNameNode
+ dfs.http.address: http://masternn:50070/
+ hostname: masternn
+ dfs.name.dir: /nn1,/nn2
+ 
+ commonname=jt1,cluster=red
+ mapred.reduce.tasks: 1
+ mapred.reduce.slowstart.completed.maps: .55
+ mapred.queue.names: big,small
+ mapred.jobtracker.taskScheduler: capacity
+ mapred.system.dir: /system/mapred
+ hostname=jt1
+ 
+ commonname=jt2,cluster=red
+ mapred.reduce.tasks: 1
+ mapred.reduce.slowstart.completed.maps: .55
+ mapred.queue.names: etl
+ mapred.jobtracker.taskScheduler: capacity
+ mapred.system.dir: /system/mapred
+ hostname=jt2
+ 
+ commonname=computenode1,cluster=red
+ objectclass=hadoopDataNode,hadoopTaskTracker
+ hostname:  node1,node2,node3
+ dfs.data.dir: /hdfs1, /hdfs2, /hdfs3
+ dfs.datanode.du.reserved: 10
+ mapred.job.tracker: commonname=jt1,cluster=red
+ mapred.local.dir: /mr1,/mr2,/mr3
+ mapred.tasktracker.map.tasks.maximum: 4
+ mapred.tasktracker.reduce.tasks.maximum: 4
+ 
+ commonname=computenode2,cluster=red
+ objectclass=hadoopDataNode,hadoopTaskTracker
+ hostname:  node4,node5,node6
+ dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4
+ dfs.datanode.du.reserved: 10
+ mapred.job.tracker: commonname=jt2,cluster=red
+ mapred.local.dir: /mr1,/mr2,/mr3,/mr4
+ mapred.tasktracker.map.tasks.maximum: 8
+ mapred.tasktracker.reduce.tasks.maximum: 4
+ }}}
+ 
- We can define multiple definitions for the same grid.  This is important when you consider
that small-medium sized grids are likely to have a mix of nodes.  For example, some nodes
may have 8 cores with four disks and some nodes may have 6 cores with eight disks.  If they
are part of the same cluster, they will need different mapred-site.xml settings in order to
maximize the hardware purchase.
+ This is important when you consider that small-medium sized grids are likely to have a mix
of nodes.  For example, some nodes may have 8 cores with four disks and some nodes may have
6 cores with eight disks.  If they are part of the same cluster, they will need different
mapred-site.xml settings in order to maximize the hardware purchase.
  
+ ----
  
  From the client side, this is a huge win.  We can do things like:
  
  {{{
  $ hadoop listgrids
+ red
+ green
  }}}
  
- In LDAP terms, this would be fetching the (objectclass=hadoopGlobalConfig) and reporting
all clusternames.  We could also do:
+ In LDAP terms, this would be fetching the (objectclass=hadoopGlobalConfig) and reporting
all clusternames.  
+ 
+ I can also submit a job without knowing any particulars or having a bunch of config files
to manage:
+ 
+ {{{
+ $ hadop job -grid red -jt jt1 -jar ....
+ }}}
+ 
+ 
+ Because we have access too all grid definitions, we could also do:
  
  {{{
  $ hadoop distcp red:/my/dir  green:/my/dir

Mime
View raw message