Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-dev@hadoop.apache.org
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
From: Apache Wiki <wikidiffs@apache.org>
To: Apache Wiki <wikidiffs@apache.org>
Date: Wed, 08 Dec 2010 07:52:23 -0000
Message-ID: <20101208075223.60300.47141@eosnew.apache.org>
Subject: 
 =?utf-8?q?=5BHadoop_Wiki=5D_Trivial_Update_of_=22FAQ=22_by_GabrielReid?=

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch=
ange notification.

The "FAQ" page has been changed by GabrielReid.
The comment on this change is: Corrected link to HDFS user guide for use of=
 secondary namenode.
http://wiki.apache.org/hadoop/FAQ?action=3Ddiff&rev1=3D86&rev2=3D87

--------------------------------------------------

  #pragma section-numbers on
- =

  '''Hadoop FAQ'''
  =

  <<TableOfContents(3)>>
  =

  =3D General =3D
- =

  =3D=3D What is Hadoop? =3D=3D
- =

  [[http://hadoop.apache.org/core/|Hadoop]] is a distributed computing plat=
form written in Java.  It incorporates features similar to those of the [[h=
ttp://en.wikipedia.org/wiki/Google_File_System|Google File System]] and of =
[[http://en.wikipedia.org/wiki/MapReduce|MapReduce]].  For some details, se=
e HadoopMapReduce.
  =

  =3D=3D What platform does Hadoop run on? =3D=3D
- =

   1. Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions
   1. Linux and Windows are the supported operating systems, but BSD, Mac O=
S/X, and OpenSolaris are known to work. (Windows requires the installation =
of [[http://www.cygwin.com/|Cygwin]]).
  =

  =3D=3D How well does Hadoop scale? =3D=3D
- =

  Hadoop has been demonstrated on clusters of up to 4000 nodes.  Sort perfo=
rmance on 900 nodes is good (sorting 9TB of data on 900 nodes takes around =
1.8 hours) and [[attachment:sort900-20080115.png|improving]] using these no=
n-default configuration values:
  =

   * `dfs.block.size =3D 134217728`
@@ -38, +33 @@

   * `mapred.child.java.opts =3D -Xmx1024m`
  =

  =3D=3D What kind of hardware scales best for Hadoop? =3D=3D
- =

  The short answer is dual processor/dual core machines with 4-8GB of RAM u=
sing ECC memory. Machines should be moderately high-end commodity machines =
to be most cost-effective and typically cost 1/2 - 2/3 the cost of normal p=
roduction application servers but are not desktop-class machines. This cost=
 tends to be $2-5K. For a more detailed discussion, see MachineScaling page.
  =

  =3D=3D How does GridGain compare to Hadoop? =3D=3D
- =

  !GridGain does not support data intensive jobs. For more details, see Had=
oopVsGridGain.
  =

  =3D=3D I have a new node I want to add to a running Hadoop cluster; how d=
o I start services on just one node? =3D=3D
- =

  This also applies to the case where a machine has crashed and rebooted, e=
tc, and you need to get it to rejoin the cluster. You do not need to shutdo=
wn and/or restart the entire cluster in this case.
  =

  First, add the new node's DNS name to the conf/slaves file on the master =
node.
@@ -58, +50 @@

  $ bin/hadoop-daemon.sh start datanode
  $ bin/hadoop-daemon.sh start tasktracker
  }}}
- =

  =3D=3D Is there an easy way to see the status and health of a cluster? =
=3D=3D
- =

  There are web-based interfaces to both the JobTracker (MapReduce master) =
and NameNode (HDFS master) which display status pages about the state of th=
e entire system. By default, these are located at http://job.tracker.addr:5=
0030/ and http://name.node.addr:50070/.
  =

  The JobTracker status page will display the state of all nodes, as well a=
s the job queue and status about all currently running jobs and tasks. The =
NameNode status page will display the state of all nodes and the amount of =
free space, and provides the ability to browse the DFS via the web.
@@ -70, +60 @@

  {{{
  $ bin/hadoop dfsadmin -report
  }}}
- =

  =3D=3D How much network bandwidth might I need between racks in a medium =
size (40-80 node) Hadoop cluster? =3D=3D
- =

  The true answer depends on the types of jobs you're running. As a back of=
 the envelope calculation one might figure something like this:
  =

  60 nodes total on 2 racks =3D 30 nodes per rack Each node might process a=
bout 100MB/sec of data In the case of a sort job where the intermediate dat=
a is the same size as the input data, that means each node needs to shuffle=
 100MB/sec of data In aggregate, each rack is then producing about 3GB/sec =
of data However, given even reducer spread across the racks, each rack will=
 need to send 1.5GB/sec to reducers running on the other rack. Since the co=
nnection is full duplex, that means you need 1.5GB/sec of bisection bandwid=
th for this theoretical job. So that's 12Gbps.
@@ -82, +70 @@

  So, the simple answer is that 4-6Gbps is most likely just fine for most p=
ractical jobs. If you want to be extra safe, many inexpensive switches can =
operate in a "stacked" configuration where the bandwidth between them is es=
sentially backplane speed. That should scale you to 96 nodes with plenty of=
 headroom. Many inexpensive gigabit switches also have one or two 10GigE po=
rts which can be used effectively to connect to each other or to a 10GE cor=
e.
  =

  =3D=3D How can I help to make Hadoop better? =3D=3D
- =

  If you have trouble figuring how to use Hadoop, then, once you've figured=
 something out (perhaps with the help of the [[http://hadoop.apache.org/cor=
e/mailing_lists.html|mailing lists]]), pass that knowledge on to others by =
adding something to this wiki.
  =

  If you find something that you wish were done better, and know how to fix=
 it, read HowToContribute, and contribute a patch.
  =

  =3D=3D I am seeing connection refused in the logs.  How do I troubleshoot=
 this? =3D=3D
- =

  See ConnectionRefused .
  =

  =3D=3D Does Hadoop require SSH? =3D=3D
- =

  Hadoop provided scripts (e.g., start-mapred.sh and start-dfs.sh) use ssh =
in order to start and stop the various daemons and some other utilities. Th=
e Hadoop framework in itself does not '''require''' ssh. Daemons (e.g. Task=
Tracker and DataNode) can also be started manually on each node without the=
 script's help.
  =

  =3D MapReduce =3D
- =

  =3D=3D Do I have to write my job in Java? =3D=3D
- =

  No.  There are several ways to incorporate non-Java code.
  =

   * HadoopStreaming permits any shell command to be used as a map or reduc=
e function.
@@ -106, +89 @@

   * [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/map=
red/pipes/package-summary.html|Hadoop Pipes]], a [[http://www.swig.org/|SWI=
G]]-compatible  C++ API (non-JNI) to write map-reduce jobs.
  =

  =3D=3D What is the Distributed Cache used for? =3D=3D
- =

  The distributed cache is used to distribute large read-only files that ar=
e needed by map/reduce jobs to the cluster. The framework will copy the nec=
essary files from a url (either hdfs: or http:) on to the slave node before=
 any tasks for the job are executed on that node. The files are only copied=
 once per job and so should not be modified by the application.
  =

  =3D=3D Can I write create/write-to hdfs files directly from map/reduce ta=
sks? =3D=3D
- =

  Yes. (Clearly, you want this since you need to create/write-to files othe=
r than the output-file written out by [[http://hadoop.apache.org/core/docs/=
current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]]=
.)
  =

  Caveats:
@@ -130, +111 @@

  The entire discussion holds true for maps of jobs with reducer=3DNONE (i.=
e. 0 reduces) since output of the map, in that case, goes directly to hdfs.
  =

  =3D=3D How do I get each of a job's maps to work on one complete input-fi=
le and not allow the framework to split-up the files? =3D=3D
- =

  Essentially a job's input is represented by the [[http://hadoop.apache.or=
g/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html|InputForm=
at]](interface)/[[http://hadoop.apache.org/core/docs/current/api/org/apache=
/hadoop/mapred/FileInputFormat.html|FileInputFormat]](base class).
  =

  For this purpose one would need a 'non-splittable' [[http://hadoop.apache=
.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|Fi=
leInputFormat]] i.e. an input-format which essentially tells the map-reduce=
 framework that it cannot be split-up and processed. To do this you need yo=
ur particular input-format to return '''false''' for the [[http://hadoop.ap=
ache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.htm=
l#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)=
|isSplittable]] call.
@@ -142, +122 @@

  The other, quick-fix option, is to set [[http://hadoop.apache.org/core/do=
cs/current/hadoop-default.html#mapred.min.split.size|mapred.min.split.size]=
] to large enough value.
  =

  =3D=3D Why I do see broken images in jobdetails.jsp page? =3D=3D
- =

  In hadoop-0.15, Map / Reduce task completion graphics are added. The grap=
hs are produced as SVG(Scalable Vector Graphics) images, which are basicall=
y xml files, embedded in html content. The graphics are tested successfully=
 in Firefox 2 on Ubuntu and MAC OS. However for other browsers, one should =
install an additional plugin to the browser to see the SVG images. Adobe's =
SVG Viewer can be found at http://www.adobe.com/svg/viewer/install/.
  =

  =3D=3D I see a maximum of 2 maps/reduces spawned concurrently on each Tas=
kTracker, how do I increase that? =3D=3D
- =

  Use the configuration knob: [[http://hadoop.apache.org/core/docs/current/=
hadoop-default.html#mapred.tasktracker.map.tasks.maximum|mapred.tasktracker=
.map.tasks.maximum]] and [[http://hadoop.apache.org/core/docs/current/hadoo=
p-default.html#mapred.tasktracker.reduce.tasks.maximum|mapred.tasktracker.r=
educe.tasks.maximum]] to control the number of maps/reduces spawned simulta=
neously on a !TaskTracker. By default, it is set to ''2'', hence one sees a=
 maximum of 2 maps and 2 reduces at a given instance on a !TaskTracker.
  =

  You can set those on a per-tasktracker basis to accurately reflect your h=
ardware (i.e. set those to higher nos. on a beefier tasktracker etc.).
  =

  =3D=3D Submitting map/reduce jobs as a different user doesn't work. =3D=
=3D
- =

  The problem is that you haven't configured your map/reduce system   direc=
tory to a fixed value. The default works for single node systems, but not f=
or   "real" clusters. I like to use:
  =

  {{{
@@ -166, +143 @@

  Note that this directory is in your default file system and must be   acc=
essible from both the client and server machines and is typically in HDFS.
  =

  =3D=3D How do Map/Reduce InputSplit's handle record boundaries correctly?=
 =3D=3D
- =

  It is the responsibility of the InputSplit's RecordReader to start and en=
d at a record boundary. For SequenceFile's every 2k bytes has a 20 bytes ''=
'sync''' mark between the records. These sync marks allow the RecordReader =
to seek to the start of the InputSplit, which contains a file, offset and l=
ength and find the first sync mark after the start of the split. The Record=
Reader continues processing records until it reaches the first sync mark af=
ter the end of the split. The first split of each file naturally starts imm=
ediately and not after the first sync mark. In this way, it is guaranteed t=
hat each record will be processed by exactly one mapper.
  =

  Text files are handled similarly, using newlines instead of sync marks.
  =

  =3D=3D How do I change final output file name with the desired name rathe=
r than in partitions like part-00000, part-00001? =3D=3D
- =

  You can subclass the [[http://svn.apache.org/viewvc/hadoop/core/trunk/src=
/mapred/org/apache/hadoop/mapred/OutputFormat.java?view=3Dmarkup|OutputForm=
at.java]] class and write your own. You can look at the code of [[http://sv=
n.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/T=
extOutputFormat.java?view=3Dmarkup|TextOutputFormat]] [[http://svn.apache.o=
rg/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/MultipleOut=
putFormat.java?view=3Dmarkup|MultipleOutputFormat.java]] etc. for reference=
. It might be the case that you only need to do minor changes to any of the=
 existing Output Format classes. To do that you can just subclass that clas=
s and override the methods you need to change.
  =

  =3D=3D When writing a New InputFormat, what is the format for the array o=
f string returned by InputSplit\#getLocations()? =3D=3D
- =

  It appears that DatanodeID.getHost() is the standard place to retrieve th=
is name, and the machineName variable, populated in DataNode.java\#startDat=
aNode, is where the name is first set. The first method attempted is to get=
 "slave.host.name" from the configuration; if that is not available, DNS.ge=
tDefaultHost is used instead.
  =

  =3D=3D How do you gracefully stop a running job? =3D=3D
- =

  {{{
  hadoop job -kill JOBID
  }}}
- =

  =3D=3D How do I limit (or increase) the number of concurrent tasks a job =
may have running total at a time? =3D=3D
  =3D=3D How do I limit (or increase) the number of concurrent tasks runnin=
g on a node? =3D=3D
- =

  For both answers, see LimitingTaskSlotUsage.
  =

- =

- =

  =3D HDFS =3D
- =

  =3D=3D If I add new DataNodes to the cluster will HDFS move the blocks to=
 the newly added nodes in order to balance disk space utilization between t=
he nodes? =3D=3D
- =

  No, HDFS will not move blocks to new nodes automatically. However, newly =
created files will likely have their blocks placed on the new nodes.
  =

  There are several ways to rebalance the cluster manually.
@@ -209, +176 @@

    * [[http://hadoop.apache.org/core/docs/current/commands_manual.html#bal=
ancer|HDFS Commands Guide: balancer]].
  =

  =3D=3D What is the purpose of the secondary name-node? =3D=3D
- =

  The term "secondary name-node" is somewhat misleading. It is not a name-n=
ode in the sense that data-nodes cannot connect to the secondary name-node,=
 and in no event it can replace the primary name-node in case of its failur=
e.
  =

- The only purpose of the secondary name-node is to perform periodic checkp=
oints. The secondary name-node periodically downloads current name-node ima=
ge and edits log files, joins them into new image and uploads the new image=
 back to the (primary and the only) name-node. See [[http://hadoop.apache.o=
rg/core/docs/current/hdfs_user_guide.html#Secondary+Namenode|User Guide]].
+ The only purpose of the secondary name-node is to perform periodic checkp=
oints. The secondary name-node periodically downloads current name-node ima=
ge and edits log files, joins them into new image and uploads the new image=
 back to the (primary and the only) name-node. See [[http://hadoop.apache.o=
rg/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode|User Guide]].
  =

  So if the name-node fails and you can restart it on the same physical nod=
e then there is no need  to shutdown data-nodes, just the name-node need to=
 be restarted. If you cannot use the old node anymore you will need to copy=
 the latest image somewhere else. The latest image can be found either on t=
he node that used to be the primary before failure if available; or on the =
secondary name-node. The latter will be the latest checkpoint without subse=
quent edits logs,  that is the most recent name space modifications may be =
missing there. You will also need to restart the whole cluster in this case.
  =

  =3D=3D Does the name-node stay in safe mode till all under-replicated fil=
es are fully replicated? =3D=3D
- =

  No. During safe mode replication of blocks is prohibited.  The name-node =
awaits when all or majority of data-nodes report their blocks.
  =

  Depending on how safe mode parameters are configured the name-node will s=
tay in safe mode  until a specific percentage of blocks of the system is ''=
minimally'' replicated  [[http://hadoop.apache.org/core/docs/current/hadoop=
-default.html#dfs.replication.min|dfs.replication.min]]. If the safe mode t=
hreshold  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#=
dfs.safemode.threshold.pct|dfs.safemode.threshold.pct]]  is set to 1 then a=
ll blocks of all  files should be minimally replicated.
@@ -227, +192 @@

  Learn more about safe mode [[http://hadoop.apache.org/hdfs/docs/current/h=
dfs_user_guide.html#Safemode|in the HDFS Users' Guide]].
  =

  =3D=3D How do I set up a hadoop node to use multiple volumes? =3D=3D
- =

  ''Data-nodes'' can store blocks in multiple directories typically allocat=
ed on different local disk drives. In order to setup multiple directories o=
ne needs to specify a comma separated list of pathnames as a value of the c=
onfiguration parameter  [[http://hadoop.apache.org/core/docs/current/hadoop=
-default.html#dfs.data.dir|dfs.data.dir]]. Data-nodes will attempt to place=
 equal amount of data in each of the directories.
  =

  The ''name-node'' also supports multiple directories, which in the case s=
tore the name space image and the edits log. The directories are specified =
via the  [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#d=
fs.name.dir|dfs.name.dir]] configuration parameter. The name-node directori=
es are used for the name space data replication so that the image and the  =
log could be restored from the remaining volumes if one of them fails.
  =

  =3D=3D What happens if one Hadoop client renames a file or a directory co=
ntaining this file while another client is still writing into it? =3D=3D
- =

  Starting with release hadoop-0.15, a file will appear in the name space a=
s soon as it is created.  If a writer is writing to a file and another clie=
nt renames either the file itself or any of its path  components, then the =
original writer will get an IOException either when it finishes writing to =
the current  block or when it closes the file.
  =

  =3D=3D I want to make a large cluster smaller by taking out a bunch of no=
des simultaneously. How can this be done? =3D=3D
- =

  On a large cluster removing one or two data-nodes will not lead to any da=
ta loss, because  name-node will replicate their blocks as long as it will =
detect that the nodes are dead. With a large number of nodes getting remove=
d or dying the probability of losing data is higher.
  =

  Hadoop offers the ''decommission'' feature to retire a set of existing da=
ta-nodes. The nodes to be retired should be included into the ''exclude fil=
e'', and the exclude file name should  be specified as a configuration para=
meter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.=
hosts.exclude|dfs.hosts.exclude]]. This file should have been specified dur=
ing namenode startup. It could be a zero length file. You must use the full=
 hostname, ip or ip:port format in this file.  Then the shell command
@@ -252, +214 @@

  The decommission process can be terminated at any time by editing the con=
figuration or the exclude files  and repeating the {{{-refreshNodes}}} comm=
and.
  =

  =3D=3D Wildcard characters doesn't work correctly in FsShell. =3D=3D
- =

  When you issue a command in !FsShell, you may want to apply that command =
to more than one file. !FsShell provides a wildcard character to help you d=
o so.  The * (asterisk) character can be used to take the place of any set =
of characters. For example, if you would like to list all the files in your=
 account which begin with the letter '''x''', you could use the ls command =
with the * wildcard:
  =

  {{{
@@ -263, +224 @@

  {{{
  bin/hadoop dfs -ls 'in*'
  }}}
- =

  =3D=3D Can I have multiple files in HDFS use different block sizes? =3D=
=3D
- =

  Yes. HDFS provides api to specify block size when you create a file. <<BR=
>> See [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/f=
s/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20sho=
rt,%20long)|FileSystem.create(Path, overwrite, bufferSize, replication, blo=
ckSize, progress)]]
  =

  =3D=3D Does HDFS make block boundaries between records? =3D=3D
- =

  No, HDFS does not provide record-oriented API and therefore is not aware =
of records and boundaries between them.
  =

  =3D=3D What happens when two clients try to write into the same HDFS file=
? =3D=3D
- =

  HDFS supports exclusive writes only. <<BR>> When the first client contact=
s the name-node to open the file for writing, the name-node grants a lease =
to the client to create this file.  When the second client tries to open th=
e same file for writing, the name-node  will see that the lease for the fil=
e is already granted to another client, and will reject the open request fo=
r the second client.
  =

  =3D=3D How to limit Data node's disk usage? =3D=3D
- =

  Use dfs.datanode.du.reserved configuration value in $HADOOP_HOME/conf/hdf=
s-site.xml for limiting disk usage.
  =

  {{{
@@ -289, +245 @@

      </description>
    </property>
  }}}
- =

  =3D=3D On an individual data node, how do you balance the blocks on the d=
isk? =3D=3D
- =

  Hadoop currently does not have a method by which to do this automatically=
.  To do this manually:
  =

   1. Take down the HDFS
-  2. Use the UNIX mv command to move the individual blocks and meta pairs =
from one directory to another on each host
+  1. Use the UNIX mv command to move the individual blocks and meta pairs =
from one directory to another on each host
-  3. Restart the HDFS
+  1. Restart the HDFS
  =

  =3D=3D What does "file could only be replicated to 0 nodes, instead of 1"=
 mean? =3D=3D
- =

  The NameNode does not have any available DataNodes.  This can be caused b=
y a wide variety of reasons.  Check the DataNode logs, the NameNode logs, n=
etwork connectivity, ...
  =

  =3D=3D If the NameNode loses its only copy of the fsimage file, can the f=
ile system be recovered from the DataNodes? =3D=3D
- =

  No.  This is why it is very important to configure dfs.name.dir to write =
to two filesystems on different physical hosts, use the secondary NameNode,=
 etc.
  =

  =3D Platform Specific =3D
  =3D=3D Windows =3D=3D
- =

  =3D=3D=3D Building / Testing Hadoop on Windows =3D=3D=3D
- =

  The Hadoop build on Windows can be run from inside a Windows (not cygwin)=
 command prompt window.
  =

  Whether you set environment variables in a batch file or in System->Prope=
rties->Advanced->Environment Variables, the following environment variables=
 need to be set:
@@ -328, +278 @@

  other targets work similarly. I just wanted to document this because I sp=
ent some time trying to figure out why the ant build would not run from a c=
ygwin command prompt window. If you are building/testing on Windows, and ha=
ven't figured it out yet, this should get you started.
  =

  =3D=3D Solaris =3D=3D
- =

  =3D=3D=3D Why do files and directories show up as DrWho and/or user names=
 are missing/weird? =3D=3D=3D
- =

  Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the =
user and groups of the running process. whoami ships as part of the BSD com=
patibility package and is normally not in the path.  The id command's outpu=
t is System V-style whereas Hadoop expects POSIX.  Two changes to the envir=
onment are required to fix this:
  =

-   1.  Make sure /usr/ucb/whoami is installed and in the path, either by i=
ncluding /usr/ucb at the tail end of the PATH environment or symlinking /us=
r/ucb/whoami directly.
+  1. Make sure /usr/ucb/whoami is installed and in the path, either by inc=
luding /usr/ucb at the tail end of the PATH environment or symlinking /usr/=
ucb/whoami directly.
-   1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly:
+  1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly:
  =

  {{{
  export HADOOP_IDENT_STRING=3D`/usr/xpg4/bin/id -u -n`
  }}}
- =

  =3D=3D=3D Reported disk capacities are wrong =3D=3D=3D
- =

  Hadoop uses du and df to determine disk space used.  On pooled storage sy=
stems that report total capacity of the entire pool (such as ZFS) rather th=
an the filesystem, Hadoop gets easily confused.  Users have reported that u=
sing fixed quota sizes for HDFS and MapReduce directories helps eliminate a=
 lot of this confusion.
 =20