hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Trivial Update of "ImportantConcepts" by TedDunning
Date Fri, 20 Jul 2007 03:08:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by TedDunning:
http://wiki.apache.org/lucene-hadoop/ImportantConcepts

------------------------------------------------------------------------------
  
  * Task - Whereas a job describes all of the inputs, outputs, classes and libraries used
in a map/reduce program, a task is the program that executes the individual map and reduce
steps.
  
- * HDFS - stands for Hadoop Distributed File System.  This is how input and output files
of Hadoop programs are normally stored.  The major advantage of HDFS are that it provides
very high input and output speeds.  This is critical for good performance for highly parallel
programs since as the number of processors involved in working on a problem increases, the
overall demand for input data increases as does the overall rate that output is produced.
 HDFS provides very high bandwidth by storing chunks of files scattered throughout the Hadoop
cluster.  By clever choice of where individual tasks are run and because files are stored
in multiple places, tasks are placed near their input data and output data is largely stored
where it is created.    
+ * [:DFS:HDFS] - stands for Hadoop Distributed File System.  This is how input and output
files of Hadoop programs are normally stored.  The major advantage of HDFS are that it provides
very high input and output speeds.  This is critical for good performance for highly parallel
programs since as the number of processors involved in working on a problem increases, the
overall demand for input data increases as does the overall rate that output is produced.
 HDFS provides very high bandwidth by storing chunks of files scattered throughout the Hadoop
cluster.  By clever choice of where individual tasks are run and because files are stored
in multiple places, tasks are placed near their input data and output data is largely stored
where it is created.    
  

Mime
View raw message