Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 76494 invoked from network); 8 Dec 2010 07:52:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Dec 2010 07:52:51 -0000 Received: (qmail 60372 invoked by uid 500); 8 Dec 2010 07:52:51 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 60112 invoked by uid 500); 8 Dec 2010 07:52:51 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 60103 invoked by uid 500); 8 Dec 2010 07:52:50 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 60100 invoked by uid 99); 8 Dec 2010 07:52:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 07:52:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 07:52:45 +0000 Received: from eosnew.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 32BB8D3; Wed, 8 Dec 2010 07:52:23 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Wed, 08 Dec 2010 07:52:23 -0000 Message-ID: <20101208075223.60300.47141@eosnew.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Trivial_Update_of_=22FAQ=22_by_GabrielReid?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "FAQ" page has been changed by GabrielReid. The comment on this change is: Corrected link to HDFS user guide for use of= secondary namenode. http://wiki.apache.org/hadoop/FAQ?action=3Ddiff&rev1=3D86&rev2=3D87 -------------------------------------------------- #pragma section-numbers on - = '''Hadoop FAQ''' = <> = =3D General =3D - = =3D=3D What is Hadoop? =3D=3D - = [[http://hadoop.apache.org/core/|Hadoop]] is a distributed computing plat= form written in Java. It incorporates features similar to those of the [[h= ttp://en.wikipedia.org/wiki/Google_File_System|Google File System]] and of = [[http://en.wikipedia.org/wiki/MapReduce|MapReduce]]. For some details, se= e HadoopMapReduce. = =3D=3D What platform does Hadoop run on? =3D=3D - = 1. Java 1.6.x or higher, preferably from Sun -see HadoopJavaVersions 1. Linux and Windows are the supported operating systems, but BSD, Mac O= S/X, and OpenSolaris are known to work. (Windows requires the installation = of [[http://www.cygwin.com/|Cygwin]]). = =3D=3D How well does Hadoop scale? =3D=3D - = Hadoop has been demonstrated on clusters of up to 4000 nodes. Sort perfo= rmance on 900 nodes is good (sorting 9TB of data on 900 nodes takes around = 1.8 hours) and [[attachment:sort900-20080115.png|improving]] using these no= n-default configuration values: = * `dfs.block.size =3D 134217728` @@ -38, +33 @@ * `mapred.child.java.opts =3D -Xmx1024m` = =3D=3D What kind of hardware scales best for Hadoop? =3D=3D - = The short answer is dual processor/dual core machines with 4-8GB of RAM u= sing ECC memory. Machines should be moderately high-end commodity machines = to be most cost-effective and typically cost 1/2 - 2/3 the cost of normal p= roduction application servers but are not desktop-class machines. This cost= tends to be $2-5K. For a more detailed discussion, see MachineScaling page. = =3D=3D How does GridGain compare to Hadoop? =3D=3D - = !GridGain does not support data intensive jobs. For more details, see Had= oopVsGridGain. = =3D=3D I have a new node I want to add to a running Hadoop cluster; how d= o I start services on just one node? =3D=3D - = This also applies to the case where a machine has crashed and rebooted, e= tc, and you need to get it to rejoin the cluster. You do not need to shutdo= wn and/or restart the entire cluster in this case. = First, add the new node's DNS name to the conf/slaves file on the master = node. @@ -58, +50 @@ $ bin/hadoop-daemon.sh start datanode $ bin/hadoop-daemon.sh start tasktracker }}} - = =3D=3D Is there an easy way to see the status and health of a cluster? = =3D=3D - = There are web-based interfaces to both the JobTracker (MapReduce master) = and NameNode (HDFS master) which display status pages about the state of th= e entire system. By default, these are located at http://job.tracker.addr:5= 0030/ and http://name.node.addr:50070/. = The JobTracker status page will display the state of all nodes, as well a= s the job queue and status about all currently running jobs and tasks. The = NameNode status page will display the state of all nodes and the amount of = free space, and provides the ability to browse the DFS via the web. @@ -70, +60 @@ {{{ $ bin/hadoop dfsadmin -report }}} - = =3D=3D How much network bandwidth might I need between racks in a medium = size (40-80 node) Hadoop cluster? =3D=3D - = The true answer depends on the types of jobs you're running. As a back of= the envelope calculation one might figure something like this: = 60 nodes total on 2 racks =3D 30 nodes per rack Each node might process a= bout 100MB/sec of data In the case of a sort job where the intermediate dat= a is the same size as the input data, that means each node needs to shuffle= 100MB/sec of data In aggregate, each rack is then producing about 3GB/sec = of data However, given even reducer spread across the racks, each rack will= need to send 1.5GB/sec to reducers running on the other rack. Since the co= nnection is full duplex, that means you need 1.5GB/sec of bisection bandwid= th for this theoretical job. So that's 12Gbps. @@ -82, +70 @@ So, the simple answer is that 4-6Gbps is most likely just fine for most p= ractical jobs. If you want to be extra safe, many inexpensive switches can = operate in a "stacked" configuration where the bandwidth between them is es= sentially backplane speed. That should scale you to 96 nodes with plenty of= headroom. Many inexpensive gigabit switches also have one or two 10GigE po= rts which can be used effectively to connect to each other or to a 10GE cor= e. = =3D=3D How can I help to make Hadoop better? =3D=3D - = If you have trouble figuring how to use Hadoop, then, once you've figured= something out (perhaps with the help of the [[http://hadoop.apache.org/cor= e/mailing_lists.html|mailing lists]]), pass that knowledge on to others by = adding something to this wiki. = If you find something that you wish were done better, and know how to fix= it, read HowToContribute, and contribute a patch. = =3D=3D I am seeing connection refused in the logs. How do I troubleshoot= this? =3D=3D - = See ConnectionRefused . = =3D=3D Does Hadoop require SSH? =3D=3D - = Hadoop provided scripts (e.g., start-mapred.sh and start-dfs.sh) use ssh = in order to start and stop the various daemons and some other utilities. Th= e Hadoop framework in itself does not '''require''' ssh. Daemons (e.g. Task= Tracker and DataNode) can also be started manually on each node without the= script's help. = =3D MapReduce =3D - = =3D=3D Do I have to write my job in Java? =3D=3D - = No. There are several ways to incorporate non-Java code. = * HadoopStreaming permits any shell command to be used as a map or reduc= e function. @@ -106, +89 @@ * [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/map= red/pipes/package-summary.html|Hadoop Pipes]], a [[http://www.swig.org/|SWI= G]]-compatible C++ API (non-JNI) to write map-reduce jobs. = =3D=3D What is the Distributed Cache used for? =3D=3D - = The distributed cache is used to distribute large read-only files that ar= e needed by map/reduce jobs to the cluster. The framework will copy the nec= essary files from a url (either hdfs: or http:) on to the slave node before= any tasks for the job are executed on that node. The files are only copied= once per job and so should not be modified by the application. = =3D=3D Can I write create/write-to hdfs files directly from map/reduce ta= sks? =3D=3D - = Yes. (Clearly, you want this since you need to create/write-to files othe= r than the output-file written out by [[http://hadoop.apache.org/core/docs/= current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]]= .) = Caveats: @@ -130, +111 @@ The entire discussion holds true for maps of jobs with reducer=3DNONE (i.= e. 0 reduces) since output of the map, in that case, goes directly to hdfs. = =3D=3D How do I get each of a job's maps to work on one complete input-fi= le and not allow the framework to split-up the files? =3D=3D - = Essentially a job's input is represented by the [[http://hadoop.apache.or= g/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html|InputForm= at]](interface)/[[http://hadoop.apache.org/core/docs/current/api/org/apache= /hadoop/mapred/FileInputFormat.html|FileInputFormat]](base class). = For this purpose one would need a 'non-splittable' [[http://hadoop.apache= .org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html|Fi= leInputFormat]] i.e. an input-format which essentially tells the map-reduce= framework that it cannot be split-up and processed. To do this you need yo= ur particular input-format to return '''false''' for the [[http://hadoop.ap= ache.org/core/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.htm= l#isSplitable(org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path)= |isSplittable]] call. @@ -142, +122 @@ The other, quick-fix option, is to set [[http://hadoop.apache.org/core/do= cs/current/hadoop-default.html#mapred.min.split.size|mapred.min.split.size]= ] to large enough value. = =3D=3D Why I do see broken images in jobdetails.jsp page? =3D=3D - = In hadoop-0.15, Map / Reduce task completion graphics are added. The grap= hs are produced as SVG(Scalable Vector Graphics) images, which are basicall= y xml files, embedded in html content. The graphics are tested successfully= in Firefox 2 on Ubuntu and MAC OS. However for other browsers, one should = install an additional plugin to the browser to see the SVG images. Adobe's = SVG Viewer can be found at http://www.adobe.com/svg/viewer/install/. = =3D=3D I see a maximum of 2 maps/reduces spawned concurrently on each Tas= kTracker, how do I increase that? =3D=3D - = Use the configuration knob: [[http://hadoop.apache.org/core/docs/current/= hadoop-default.html#mapred.tasktracker.map.tasks.maximum|mapred.tasktracker= .map.tasks.maximum]] and [[http://hadoop.apache.org/core/docs/current/hadoo= p-default.html#mapred.tasktracker.reduce.tasks.maximum|mapred.tasktracker.r= educe.tasks.maximum]] to control the number of maps/reduces spawned simulta= neously on a !TaskTracker. By default, it is set to ''2'', hence one sees a= maximum of 2 maps and 2 reduces at a given instance on a !TaskTracker. = You can set those on a per-tasktracker basis to accurately reflect your h= ardware (i.e. set those to higher nos. on a beefier tasktracker etc.). = =3D=3D Submitting map/reduce jobs as a different user doesn't work. =3D= =3D - = The problem is that you haven't configured your map/reduce system direc= tory to a fixed value. The default works for single node systems, but not f= or "real" clusters. I like to use: = {{{ @@ -166, +143 @@ Note that this directory is in your default file system and must be acc= essible from both the client and server machines and is typically in HDFS. = =3D=3D How do Map/Reduce InputSplit's handle record boundaries correctly?= =3D=3D - = It is the responsibility of the InputSplit's RecordReader to start and en= d at a record boundary. For SequenceFile's every 2k bytes has a 20 bytes ''= 'sync''' mark between the records. These sync marks allow the RecordReader = to seek to the start of the InputSplit, which contains a file, offset and l= ength and find the first sync mark after the start of the split. The Record= Reader continues processing records until it reaches the first sync mark af= ter the end of the split. The first split of each file naturally starts imm= ediately and not after the first sync mark. In this way, it is guaranteed t= hat each record will be processed by exactly one mapper. = Text files are handled similarly, using newlines instead of sync marks. = =3D=3D How do I change final output file name with the desired name rathe= r than in partitions like part-00000, part-00001? =3D=3D - = You can subclass the [[http://svn.apache.org/viewvc/hadoop/core/trunk/src= /mapred/org/apache/hadoop/mapred/OutputFormat.java?view=3Dmarkup|OutputForm= at.java]] class and write your own. You can look at the code of [[http://sv= n.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/T= extOutputFormat.java?view=3Dmarkup|TextOutputFormat]] [[http://svn.apache.o= rg/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/MultipleOut= putFormat.java?view=3Dmarkup|MultipleOutputFormat.java]] etc. for reference= . It might be the case that you only need to do minor changes to any of the= existing Output Format classes. To do that you can just subclass that clas= s and override the methods you need to change. = =3D=3D When writing a New InputFormat, what is the format for the array o= f string returned by InputSplit\#getLocations()? =3D=3D - = It appears that DatanodeID.getHost() is the standard place to retrieve th= is name, and the machineName variable, populated in DataNode.java\#startDat= aNode, is where the name is first set. The first method attempted is to get= "slave.host.name" from the configuration; if that is not available, DNS.ge= tDefaultHost is used instead. = =3D=3D How do you gracefully stop a running job? =3D=3D - = {{{ hadoop job -kill JOBID }}} - = =3D=3D How do I limit (or increase) the number of concurrent tasks a job = may have running total at a time? =3D=3D =3D=3D How do I limit (or increase) the number of concurrent tasks runnin= g on a node? =3D=3D - = For both answers, see LimitingTaskSlotUsage. = - = - = =3D HDFS =3D - = =3D=3D If I add new DataNodes to the cluster will HDFS move the blocks to= the newly added nodes in order to balance disk space utilization between t= he nodes? =3D=3D - = No, HDFS will not move blocks to new nodes automatically. However, newly = created files will likely have their blocks placed on the new nodes. = There are several ways to rebalance the cluster manually. @@ -209, +176 @@ * [[http://hadoop.apache.org/core/docs/current/commands_manual.html#bal= ancer|HDFS Commands Guide: balancer]]. = =3D=3D What is the purpose of the secondary name-node? =3D=3D - = The term "secondary name-node" is somewhat misleading. It is not a name-n= ode in the sense that data-nodes cannot connect to the secondary name-node,= and in no event it can replace the primary name-node in case of its failur= e. = - The only purpose of the secondary name-node is to perform periodic checkp= oints. The secondary name-node periodically downloads current name-node ima= ge and edits log files, joins them into new image and uploads the new image= back to the (primary and the only) name-node. See [[http://hadoop.apache.o= rg/core/docs/current/hdfs_user_guide.html#Secondary+Namenode|User Guide]]. + The only purpose of the secondary name-node is to perform periodic checkp= oints. The secondary name-node periodically downloads current name-node ima= ge and edits log files, joins them into new image and uploads the new image= back to the (primary and the only) name-node. See [[http://hadoop.apache.o= rg/hdfs/docs/current/hdfs_user_guide.html#Secondary+NameNode|User Guide]]. = So if the name-node fails and you can restart it on the same physical nod= e then there is no need to shutdown data-nodes, just the name-node need to= be restarted. If you cannot use the old node anymore you will need to copy= the latest image somewhere else. The latest image can be found either on t= he node that used to be the primary before failure if available; or on the = secondary name-node. The latter will be the latest checkpoint without subse= quent edits logs, that is the most recent name space modifications may be = missing there. You will also need to restart the whole cluster in this case. = =3D=3D Does the name-node stay in safe mode till all under-replicated fil= es are fully replicated? =3D=3D - = No. During safe mode replication of blocks is prohibited. The name-node = awaits when all or majority of data-nodes report their blocks. = Depending on how safe mode parameters are configured the name-node will s= tay in safe mode until a specific percentage of blocks of the system is ''= minimally'' replicated [[http://hadoop.apache.org/core/docs/current/hadoop= -default.html#dfs.replication.min|dfs.replication.min]]. If the safe mode t= hreshold [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#= dfs.safemode.threshold.pct|dfs.safemode.threshold.pct]] is set to 1 then a= ll blocks of all files should be minimally replicated. @@ -227, +192 @@ Learn more about safe mode [[http://hadoop.apache.org/hdfs/docs/current/h= dfs_user_guide.html#Safemode|in the HDFS Users' Guide]]. = =3D=3D How do I set up a hadoop node to use multiple volumes? =3D=3D - = ''Data-nodes'' can store blocks in multiple directories typically allocat= ed on different local disk drives. In order to setup multiple directories o= ne needs to specify a comma separated list of pathnames as a value of the c= onfiguration parameter [[http://hadoop.apache.org/core/docs/current/hadoop= -default.html#dfs.data.dir|dfs.data.dir]]. Data-nodes will attempt to place= equal amount of data in each of the directories. = The ''name-node'' also supports multiple directories, which in the case s= tore the name space image and the edits log. The directories are specified = via the [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#d= fs.name.dir|dfs.name.dir]] configuration parameter. The name-node directori= es are used for the name space data replication so that the image and the = log could be restored from the remaining volumes if one of them fails. = =3D=3D What happens if one Hadoop client renames a file or a directory co= ntaining this file while another client is still writing into it? =3D=3D - = Starting with release hadoop-0.15, a file will appear in the name space a= s soon as it is created. If a writer is writing to a file and another clie= nt renames either the file itself or any of its path components, then the = original writer will get an IOException either when it finishes writing to = the current block or when it closes the file. = =3D=3D I want to make a large cluster smaller by taking out a bunch of no= des simultaneously. How can this be done? =3D=3D - = On a large cluster removing one or two data-nodes will not lead to any da= ta loss, because name-node will replicate their blocks as long as it will = detect that the nodes are dead. With a large number of nodes getting remove= d or dying the probability of losing data is higher. = Hadoop offers the ''decommission'' feature to retire a set of existing da= ta-nodes. The nodes to be retired should be included into the ''exclude fil= e'', and the exclude file name should be specified as a configuration para= meter [[http://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.= hosts.exclude|dfs.hosts.exclude]]. This file should have been specified dur= ing namenode startup. It could be a zero length file. You must use the full= hostname, ip or ip:port format in this file. Then the shell command @@ -252, +214 @@ The decommission process can be terminated at any time by editing the con= figuration or the exclude files and repeating the {{{-refreshNodes}}} comm= and. = =3D=3D Wildcard characters doesn't work correctly in FsShell. =3D=3D - = When you issue a command in !FsShell, you may want to apply that command = to more than one file. !FsShell provides a wildcard character to help you d= o so. The * (asterisk) character can be used to take the place of any set = of characters. For example, if you would like to list all the files in your= account which begin with the letter '''x''', you could use the ls command = with the * wildcard: = {{{ @@ -263, +224 @@ {{{ bin/hadoop dfs -ls 'in*' }}} - = =3D=3D Can I have multiple files in HDFS use different block sizes? =3D= =3D - = Yes. HDFS provides api to specify block size when you create a file. <> See [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/f= s/FileSystem.html#create(org.apache.hadoop.fs.Path,%20boolean,%20int,%20sho= rt,%20long)|FileSystem.create(Path, overwrite, bufferSize, replication, blo= ckSize, progress)]] = =3D=3D Does HDFS make block boundaries between records? =3D=3D - = No, HDFS does not provide record-oriented API and therefore is not aware = of records and boundaries between them. = =3D=3D What happens when two clients try to write into the same HDFS file= ? =3D=3D - = HDFS supports exclusive writes only. <
> When the first client contact= s the name-node to open the file for writing, the name-node grants a lease = to the client to create this file. When the second client tries to open th= e same file for writing, the name-node will see that the lease for the fil= e is already granted to another client, and will reject the open request fo= r the second client. = =3D=3D How to limit Data node's disk usage? =3D=3D - = Use dfs.datanode.du.reserved configuration value in $HADOOP_HOME/conf/hdf= s-site.xml for limiting disk usage. = {{{ @@ -289, +245 @@ }}} - = =3D=3D On an individual data node, how do you balance the blocks on the d= isk? =3D=3D - = Hadoop currently does not have a method by which to do this automatically= . To do this manually: = 1. Take down the HDFS - 2. Use the UNIX mv command to move the individual blocks and meta pairs = from one directory to another on each host + 1. Use the UNIX mv command to move the individual blocks and meta pairs = from one directory to another on each host - 3. Restart the HDFS + 1. Restart the HDFS = =3D=3D What does "file could only be replicated to 0 nodes, instead of 1"= mean? =3D=3D - = The NameNode does not have any available DataNodes. This can be caused b= y a wide variety of reasons. Check the DataNode logs, the NameNode logs, n= etwork connectivity, ... = =3D=3D If the NameNode loses its only copy of the fsimage file, can the f= ile system be recovered from the DataNodes? =3D=3D - = No. This is why it is very important to configure dfs.name.dir to write = to two filesystems on different physical hosts, use the secondary NameNode,= etc. = =3D Platform Specific =3D =3D=3D Windows =3D=3D - = =3D=3D=3D Building / Testing Hadoop on Windows =3D=3D=3D - = The Hadoop build on Windows can be run from inside a Windows (not cygwin)= command prompt window. = Whether you set environment variables in a batch file or in System->Prope= rties->Advanced->Environment Variables, the following environment variables= need to be set: @@ -328, +278 @@ other targets work similarly. I just wanted to document this because I sp= ent some time trying to figure out why the ant build would not run from a c= ygwin command prompt window. If you are building/testing on Windows, and ha= ven't figured it out yet, this should get you started. = =3D=3D Solaris =3D=3D - = =3D=3D=3D Why do files and directories show up as DrWho and/or user names= are missing/weird? =3D=3D=3D - = Prior to 0.22, Hadoop uses the 'whoami' and id commands to determine the = user and groups of the running process. whoami ships as part of the BSD com= patibility package and is normally not in the path. The id command's outpu= t is System V-style whereas Hadoop expects POSIX. Two changes to the envir= onment are required to fix this: = - 1. Make sure /usr/ucb/whoami is installed and in the path, either by i= ncluding /usr/ucb at the tail end of the PATH environment or symlinking /us= r/ucb/whoami directly. + 1. Make sure /usr/ucb/whoami is installed and in the path, either by inc= luding /usr/ucb at the tail end of the PATH environment or symlinking /usr/= ucb/whoami directly. - 1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly: + 1. In hadoop-env.sh, change the HADOOP_IDENT_STRING thusly: = {{{ export HADOOP_IDENT_STRING=3D`/usr/xpg4/bin/id -u -n` }}} - = =3D=3D=3D Reported disk capacities are wrong =3D=3D=3D - = Hadoop uses du and df to determine disk space used. On pooled storage sy= stems that report total capacity of the entire pool (such as ZFS) rather th= an the filesystem, Hadoop gets easily confused. Users have reported that u= sing fixed quota sizes for HDFS and MapReduce directories helps eliminate a= lot of this confusion. =20