hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "FAQ" by GabrielReid
Date Wed, 08 Dec 2010 07:52:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by GabrielReid.
The comment on this change is: Corrected link to HDFS user guide for use of secondary namenode.


  #pragma section-numbers on
  '''Hadoop FAQ'''
  = General =
  == What is Hadoop? ==
  [[http://hadoop.apache.org/core/|Hadoop]] is a distributed computing platform written in
Java.  It incorporates features similar to those of the [[http://en.wikipedia.org/wiki/Google_File_System|Google
File System]] and of [[http://en.wikipedia.org/wiki/MapReduce|MapReduce]].  For some details,
see HadoopMapReduce.
  == What platform does Hadoop run on? ==
   1. Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions
   1. Linux and Windows are the supported operating systems, but BSD, Mac OS/X, and OpenSolaris
are known to work. (Windows requires the installation of [[http://www.cygwin.com/|Cygwin]]).
  == How well does Hadoop scale? ==
  Hadoop has been demonstrated on clusters of up to 4000 nodes.  Sort performance on 900 nodes
is good (sorting 9TB of data on 900 nodes takes around 1.8 hours) and [[attachment:sort900-20080115.png|improving]]
using these non-default configuration values:
   * `dfs.block.size = 134217728`
@@ -38, +33 @@

   * `mapred.child.java.opts = -Xmx1024m`
  == What kind of hardware scales best for Hadoop? ==
  The short answer is dual processor/dual core machines with 4-8GB of RAM using ECC memory.
Machines should be moderately high-end commodity machines to be most cost-effective and typically
cost 1/2 - 2/3 the cost of normal production application servers but are not desktop-class
machines. This cost tends to be $2-5K. For a more detailed discussion, see MachineScaling
  == How does GridGain compare to Hadoop? ==
  !GridGain does not support data intensive jobs. For more details, see HadoopVsGridGain.
  == I have a new node I want to add to a running Hadoop cluster; how do I start services
on just one node? ==
  This also applies to the case where a machine has crashed and rebooted, etc, and you need
to get it to rejoin the cluster. You do not need to shutdown and/or restart the entire cluster
in this case.
  First, add the new node's DNS name to the conf/slaves file on the master node.
@@ -58, +50 @@

  $ bin/hadoop-daemon.sh start datanode
  $ bin/hadoop-daemon.sh start tasktracker
  == Is there an easy way to see the status and health of a cluster? ==
  There are web-based interfaces to both the JobTracker (MapReduce master) and NameNode (HDFS
master) which display status pages about the state of the entire system. By default, these
are located at http://job.tracker.addr:50030/ and http://name.node.addr:50070/.
  The JobTracker status page will display the state of all nodes, as well as the job queue
and status about all currently running jobs and tasks. The NameNode status page will display
the state of all nodes and the amount of free space, and provides the ability to browse the
DFS via the web.
@@ -70, +60 @@

  $ bin/hadoop dfsadmin -report
  == How much network bandwidth might I need between racks in a medium size (40-80 node) Hadoop
cluster? ==
  The true answer depends on the types of jobs you're running. As a back of the envelope calculation
one might figure something like this:
  60 nodes total on 2 racks = 30 nodes per rack Each node might process about 100MB/sec of
data In the case of a sort job where the intermediate data is the same size as the input data,
that means each node needs to shuffle 100MB/sec of data In aggregate, each rack is then producing
about 3GB/sec of data However, given even reducer spread across the racks, each rack will
need to send 1.5GB/sec to reducers running on the other rack. Since the connection is full
duplex, that means you need 1.5GB/sec of bisection bandwidth for this theoretical job. So
that's 12Gbps.
@@ -82, +70 @@

  So, the simple answer is that 4-6Gbps is most likely just fine for most practical jobs.
If you want to be extra safe, many inexpensive switches can operate in a "stacked" configuration
where the bandwidth between them is essentially backplane speed. That should scale you to
96 nodes with plenty of headroom. Many inexpensive gigabit switches also have one or two 10GigE
ports which can be used effectively to connect to each other or to a 10GE core.
  == How can I help to make Hadoop better? ==
  If you have trouble figuring how to use Hadoop, then, once you've figured something out
(perhaps with the help of the [[http://hadoop.apache.org/core/mailing_lists.html|mailing lists]]),
pass that knowledge on to others by adding something to this wiki.
  If you find something that you wish were done better, and know how to fix it, read HowToContribute,
and contribute a patch.
  == I am seeing connection refused in the logs.  How do I troubleshoot this? ==
  See ConnectionRefused .
  == Does Hadoop require SSH? ==
  Hadoop provided scripts (e.g., start-mapred.sh and start-dfs.sh) use ssh in order to start
and stop the various daemons and some other utilities. The Hadoop framework in itself does
not '''require''' ssh. Daemons (e.g. TaskTracker and DataNode) can also be started manually
on each node without the script's help.
  = MapReduce =
  == Do I have to write my job in Java? ==
  No.  There are several ways to incorporate non-Java code.
   * HadoopStreaming permits any shell command to be used as a map or reduce function.
@@ -106, +89 @@

   * [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/pipes/package-summary.html|Hadoop
Pipes]], a [[http://www.swig.org/|SWIG]]-compatible  C++ API (non-JNI) to write map-reduce
  == What is the Distributed Cache used for? ==
  The distributed cache is used to distribute large read-only files that are needed by map/reduce
jobs to the cluster. The framework will copy the necessary files from a url (either hdfs:
or http:) on to the slave node before any tasks for the job are executed on that node. The
files are only copied once per job and so should not be modified by the application.
  == Can I write create/write-to hdfs files directly from map/reduce tasks? ==
  Yes. (Clearly, you want this since you need to create/write-to files other than the output-file
written out by [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)
@@ -130, +111 @@

  The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since
output of the map, in that case, goes directly to hdfs.
  == How do I get each of a job's maps to work on one complete input-file and not allow the
framework to split-up the files? ==
  Essentially a job's input is represented by the [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html|InputFormat]](interface)/[[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]](base
  For this purpose one would need a 'non-splittable' [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|FileInputFormat]]
i.e. an input-format which essentially tells the map-reduce framework that it cannot be split-up
and processed. To do this you need your particular input-format to return '''false''' for
the [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)|isSplittable]]
@@ -142, +122 @@

  The other, quick-fix option, is to set [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.min.split.size|mapred.min.split.size]]
to large enough value.
  == Why I do see broken images in jobdetails.jsp page? ==
  In hadoop-0.15, Map / Reduce task completion graphics are added. The graphs are produced
as SVG(Scalable Vector Graphics) images, which are basically xml files, embedded in html content.
The graphics are tested successfully in Firefox 2 on Ubuntu and MAC OS. However for other
browsers, one should install an additional plugin to the browser to see the SVG images. Adobe's
SVG Viewer can be found at http://www.adobe.com/svg/viewer/install/.
  == I see a maximum of 2 maps/reduces spawned concurrently on each TaskTracker, how do I
increase that? ==
  Use the configuration knob: [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.map.tasks.maximum|mapred.tasktracker.map.tasks.maximum]]
and [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.tasktracker.reduce.tasks.maximum|mapred.tasktracker.reduce.tasks.maximum]]
to control the number of maps/reduces spawned simultaneously on a !TaskTracker. By default,
it is set to ''2'', hence one sees a maximum of 2 maps and 2 reduces at a given instance on
a !TaskTracker.
  You can set those on a per-tasktracker basis to accurately reflect your hardware (i.e. set
those to higher nos. on a beefier tasktracker etc.).
  == Submitting map/reduce jobs as a different user doesn't work. ==
  The problem is that you haven't configured your map/reduce system   directory to a fixed
value. The default works for single node systems, but not for   "real" clusters. I like to
@@ -166, +143 @@

  Note that this directory is in your default file system and must be   accessible from both
the client and server machines and is typically in HDFS.
  == How do Map/Reduce InputSplit's handle record boundaries correctly? ==
  It is the responsibility of the InputSplit's RecordReader to start and end at a record boundary.
For SequenceFile's every 2k bytes has a 20 bytes '''sync''' mark between the records. These
sync marks allow the RecordReader to seek to the start of the InputSplit, which contains a
file, offset and length and find the first sync mark after the start of the split. The RecordReader
continues processing records until it reaches the first sync mark after the end of the split.
The first split of each file naturally starts immediately and not after the first sync mark.
In this way, it is guaranteed that each record will be processed by exactly one mapper.
  Text files are handled similarly, using newlines instead of sync marks.
  == How do I change final output file name with the desired name rather than in partitions
like part-00000, part-00001? ==
  You can subclass the [[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/OutputFormat.java?view=markup|OutputFormat.java]]
class and write your own. You can look at the code of [[http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java?view=markup|TextOutputFormat]]
etc. for reference. It might be the case that you only need to do minor changes to any of
the existing Output Format classes. To do that you can just subclass that class and override
the methods you need to change.
  == When writing a New InputFormat, what is the format for the array of string returned by
InputSplit\#getLocations()? ==
  It appears that DatanodeID.getHost() is the standard place to retrieve this name, and the
machineName variable, populated in DataNode.java\#startDataNode, is where the name is first
set. The first method attempted is to get "slave.host.name" from the configuration; if that
is not available, DNS.getDefaultHost is used instead.
  == How do you gracefully stop a running job? ==
  hadoop job -kill JOBID
  == How do I limit (or increase) the number of concurrent tasks a job may have running total
at a time? ==
  == How do I limit (or increase) the number of concurrent tasks running on a node? ==
  For both answers, see LimitingTaskSlotUsage.
  = HDFS =
  == If I add new DataNodes to the cluster will HDFS move the blocks to the newly added nodes
in order to balance disk space utilization between the nodes? ==
  No, HDFS will not move blocks to new nodes automatically. However, newly created files will
likely have their blocks placed on the new nodes.
  There are several ways to rebalance the cluster manually.
@@ -209, +176 @@

    * [[http://hadoop.apache.org/core/docs/current/commands_manual.html#balancer|HDFS Commands
Guide: balancer]].
  == What is the purpose of the secondary name-node? ==
  The term "secondary name-node" is somewhat misleading. It is not a name-node in the sense
that data-nodes cannot connect to the secondary name-node, and in no event it can replace
the primary name-node in case of its failure.
- The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary
name-node periodically downloads current name-node image and edits log files, joins them into
new image and uploads the new image back to the (primary and the only) name-node. See [[http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Secondary+Namenode|User
+ The only purpose of the secondary name-node is to perform periodic checkpoints. The secondary
name-node periodically downloads current name-node image and edits log files, joins them into
new image and uploads the new image back to the (primary and the only) name-node. See [[http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode|User
  So if the name-node fails and you can restart it on the same physical node then there is
no need  to shutdown data-nodes, just the name-node need to be restarted. If you cannot use
the old node anymore you will need to copy the latest image somewhere else. The latest image
can be found either on the node that used to be the primary before failure if available; or
on the secondary name-node. The latter will be the latest checkpoint without subsequent edits
logs,  that is the most recent name space modifications may be missing there. You will also
need to restart the whole cluster in this case.
  == Does the name-node stay in safe mode till all under-replicated files are fully replicated?
  No. During safe mode replication of blocks is prohibited.  The name-node awaits when all
or majority of data-nodes report their blocks.
  Depending on how safe mode parameters are configured the name-node will stay in safe mode
 until a specific percentage of blocks of the system is ''minimally'' replicated  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replication.min|dfs.replication.min]].
If the safe mode threshold  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.safemode.threshold.pct|dfs.safemode.threshold.pct]]
 is set to 1 then all blocks of all  files should be minimally replicated.
@@ -227, +192 @@

  Learn more about safe mode [[http://hadoop.apache.org/hdfs/docs/current/hdfs_user_guide.html#Safemode|in
the HDFS Users' Guide]].
  == How do I set up a hadoop node to use multiple volumes? ==
  ''Data-nodes'' can store blocks in multiple directories typically allocated on different
local disk drives. In order to setup multiple directories one needs to specify a comma separated
list of pathnames as a value of the configuration parameter  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.data.dir|dfs.data.dir]].
Data-nodes will attempt to place equal amount of data in each of the directories.
  The ''name-node'' also supports multiple directories, which in the case store the name space
image and the edits log. The directories are specified via the  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.name.dir|dfs.name.dir]]
configuration parameter. The name-node directories are used for the name space data replication
so that the image and the  log could be restored from the remaining volumes if one of them
  == What happens if one Hadoop client renames a file or a directory containing this file
while another client is still writing into it? ==
  Starting with release hadoop-0.15, a file will appear in the name space as soon as it is
created.  If a writer is writing to a file and another client renames either the file itself
or any of its path  components, then the original writer will get an IOException either when
it finishes writing to the current  block or when it closes the file.
  == I want to make a large cluster smaller by taking out a bunch of nodes simultaneously.
How can this be done? ==
  On a large cluster removing one or two data-nodes will not lead to any data loss, because
 name-node will replicate their blocks as long as it will detect that the nodes are dead.
With a large number of nodes getting removed or dying the probability of losing data is higher.
  Hadoop offers the ''decommission'' feature to retire a set of existing data-nodes. The nodes
to be retired should be included into the ''exclude file'', and the exclude file name should
 be specified as a configuration parameter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.hosts.exclude|dfs.hosts.exclude]].
This file should have been specified during namenode startup. It could be a zero length file.
You must use the full hostname, ip or ip:port format in this file.  Then the shell command
@@ -252, +214 @@

  The decommission process can be terminated at any time by editing the configuration or the
exclude files  and repeating the {{{-refreshNodes}}} command.
  == Wildcard characters doesn't work correctly in FsShell. ==
  When you issue a command in !FsShell, you may want to apply that command to more than one
file. !FsShell provides a wildcard character to help you do so.  The * (asterisk) character
can be used to take the place of any set of characters. For example, if you would like to
list all the files in your account which begin with the letter '''x''', you could use the
ls command with the * wildcard:
@@ -263, +224 @@

  bin/hadoop dfs -ls 'in*'
  == Can I have multiple files in HDFS use different block sizes? ==
  Yes. HDFS provides api to specify block size when you create a file. <<BR>>
See [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20short,%20long)|FileSystem.create(Path,
overwrite, bufferSize, replication, blockSize, progress)]]
  == Does HDFS make block boundaries between records? ==
  No, HDFS does not provide record-oriented API and therefore is not aware of records and
boundaries between them.
  == What happens when two clients try to write into the same HDFS file? ==
  HDFS supports exclusive writes only. <<BR>> When the first client contacts the
name-node to open the file for writing, the name-node grants a lease to the client to create
this file.  When the second client tries to open the same file for writing, the name-node
 will see that the lease for the file is already granted to another client, and will reject
the open request for the second client.
  == How to limit Data node's disk usage? ==
  Use dfs.datanode.du.reserved configuration value in $HADOOP_HOME/conf/hdfs-site.xml for
limiting disk usage.
@@ -289, +245 @@

  == On an individual data node, how do you balance the blocks on the disk? ==
  Hadoop currently does not have a method by which to do this automatically.  To do this manually:
   1. Take down the HDFS
-  2. Use the UNIX mv command to move the individual blocks and meta pairs from one directory
to another on each host
+  1. Use the UNIX mv command to move the individual blocks and meta pairs from one directory
to another on each host
-  3. Restart the HDFS
+  1. Restart the HDFS
  == What does "file could only be replicated to 0 nodes, instead of 1" mean? ==
  The NameNode does not have any available DataNodes.  This can be caused by a wide variety
of reasons.  Check the DataNode logs, the NameNode logs, network connectivity, ...
  == If the NameNode loses its only copy of the fsimage file, can the file system be recovered
from the DataNodes? ==
  No.  This is why it is very important to configure dfs.name.dir to write to two filesystems
on different physical hosts, use the secondary NameNode, etc.
  = Platform Specific =
  == Windows ==
  === Building / Testing Hadoop on Windows ===
  The Hadoop build on Windows can be run from inside a Windows (not cygwin) command prompt
  Whether you set environment variables in a batch file or in System->Properties->Advanced->Environment
Variables, the following environment variables need to be set:
@@ -328, +278 @@

  other targets work similarly. I just wanted to document this because I spent some time trying
to figure out why the ant build would not run from a cygwin command prompt window. If you
are building/testing on Windows, and haven't figured it out yet, this should get you started.
  == Solaris ==
  === Why do files and directories show up as DrWho and/or user names are missing/weird? ===
  Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the user and groups
of the running process. whoami ships as part of the BSD compatibility package and is normally
not in the path.  The id command's output is System V-style whereas Hadoop expects POSIX.
 Two changes to the environment are required to fix this:
-   1.  Make sure /usr/ucb/whoami is installed and in the path, either by including /usr/ucb
at the tail end of the PATH environment or symlinking /usr/ucb/whoami directly.
+  1. Make sure /usr/ucb/whoami is installed and in the path, either by including /usr/ucb
at the tail end of the PATH environment or symlinking /usr/ucb/whoami directly.
-   1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly:
+  1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly:
  export HADOOP_IDENT_STRING=`/usr/xpg4/bin/id -u -n`
  === Reported disk capacities are wrong ===
  Hadoop uses du and df to determine disk space used.  On pooled storage systems that report
total capacity of the entire pool (such as ZFS) rather than the filesystem, Hadoop gets easily
confused.  Users have reported that using fixed quota sizes for HDFS and MapReduce directories
helps eliminate a lot of this confusion.

View raw message