hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "FAQ" by KonstantinShvachko
Date Wed, 07 Nov 2007 02:16:20 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/lucene-hadoop/FAQ

------------------------------------------------------------------------------
  
  [[BR]]
  [[Anchor(6)]]
- '''6. [#6 If I add new data-nodes to the cluster will HDFS move the blocks to the newly
added nodes in order to balance disk space utilization between the nodes?]'''
+ '''6. [#6 HDFS. If I add new data-nodes to the cluster will HDFS move the blocks to the
newly added nodes in order to balance disk space utilization between the nodes?]'''
  
  No, HDFS will not move blocks to new nodes automatically. However, newly created files will
likely have their blocks placed on the new nodes.
  
@@ -93, +93 @@

  
  [[BR]]
  [[Anchor(7)]]
- '''7. [#7 What is the purpose of the secondary name-node?]'''
+ '''7. [#7 HDFS. What is the purpose of the secondary name-node?]'''
  
  The term "secondary name-node" is somewhat misleading.
  It is not a name-node in the sense that data-nodes cannot connect to the secondary name-node,
@@ -114, +114 @@

  
  [[BR]]
  [[Anchor(8)]]
- '''8. [#8 What is the Distributed Cache used for?]'''
+ '''8. [#8 MR. What is the Distributed Cache used for?]'''
  
  The distributed cache is used to distribute large read-only files that are needed by map/reduce
jobs to the cluster. The framework will copy the necessary files from a url (either hdfs:
or http:) on to the slave node before any tasks for the job are executed on that node. The
files are only copied once per job and so should not be modified by the application.
  
  
  [[BR]]
  [[Anchor(9)]]
- '''9. [#9 Can I write create/write-to hdfs files directly from my map/reduce tasks?]'''
+ '''9. [#9 MR. Can I write create/write-to hdfs files directly from my map/reduce tasks?]'''
  
  Yes. (Clearly, you want this since you need to create/write-to files other than the output-file
written out by [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/OutputCollector.html
OutputCollector].)
  
@@ -148, +148 @@

  
  [[BR]]
  [[Anchor(10)]]
- '''10. [#10 How do I get each of my maps to work on one complete input-file and not allow
the framework to split-up my files?]'''
+ '''10. [#10 MR. How do I get each of my maps to work on one complete input-file and not
allow the framework to split-up my files?]'''
  
  Essentially a job's input is represented by the [http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/InputFormat.html
InputFormat](interface)/[http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/FileInputFormat.html
FileInputFormat](base class).
  
@@ -171, +171 @@

  
  [[BR]]
  [[Anchor(12)]]
- '''12. [#12 Does the name-node stay in safe mode till all under-replicated files are fully
replicated?]'''
+ '''12. [#12 HDFS. Does the name-node stay in safe mode till all under-replicated files are
fully replicated?]'''
  
  No. During safe mode replication of blocks is prohibited. 
  The name-node awaits when all or majority of data-nodes report their blocks.
@@ -193, +193 @@

  
  [[BR]]
  [[Anchor(13)]]
- '''13. [#13 I see a maximum of 2 maps/reduces spawned concurrently on each TaskTracker,
how do I increase that?]'''
+ '''13. [#13 MR. I see a maximum of 2 maps/reduces spawned concurrently on each TaskTracker,
how do I increase that?]'''
  
  Use the configuration knob: [http://lucene.apache.org/hadoop/hadoop-default.html#mapred.tasktracker.tasks.maximum
mapred.tasktracker.tasks.maximum] to control the number of maps/reduces spawned simultaneously
on a !TaskTracker. By default, it is set to ''2'', hence one sees a maximum of 2 maps and
2 reduces at a given instance on a !TaskTracker.
  
@@ -201, +201 @@

     * ''mapred.tasktracker.tasks.maximum'' that is a cluster-wide limit i.e. controlled at
the !JobTracker end. [http://issues.apache.org/jira/browse/HADOOP-1245 HADOOP-1245] should
fix that.
     * ''mapred.tasktracker.tasks.maximum'' controls the number of maps '''and''' number of
reduces (independently). [http://issues.apache.org/jira/browse/HADOOP-1274 HADOOP-1274] should
fix that.
  
+ 
  [[BR]]
- [[Anchor(13)]]
+ [[Anchor(14)]]
- '''13. [#13 Submitting map/reduce jobs as a different user doesn't work.]'''
+ '''14. [#14 MR. Submitting map/reduce jobs as a different user doesn't work.]'''
  
  The problem is that you haven't configured your map/reduce system  
  directory to a fixed value. The default works for single node systems, but not for  
@@ -222, +223 @@

  accessible from both the client and server machines and is typically  
  in HDFS.
  
+ 
+ [[BR]]
+ [[Anchor(15)]]
+ '''15. [#15 HDFS. How do I set up a hadoop node to use multiple volumes?]'''
+ 
+ ''Data-nodes'' can store blocks in multiple directories typically allocated on different
local disk drives.
+ In order to setup multiple directories one needs to specify a comma separated list of pathnames
as a value of
+ the configuration parameter 
+ [http://lucene.apache.org/hadoop/hadoop-default.html#dfs.data.dir dfs.data.dir].
+ Data-nodes will attempt to place equal amount of data in each of the directories.
+ 
+ The ''name-node'' also supports multiple directories, which in the case store the name space
image and the edits log.
+ The directories are specified via the 
+ [http://lucene.apache.org/hadoop/hadoop-default.html#dfs.name.dir dfs.name.dir]
+ configuration parameter.
+ The name-node directories are used for the name space data replication so that the image
and the 
+ log could be restored from the remaining volumes if one of them fails. 
+ 

Mime
View raw message