hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FrontPage" by SteveLoughran
Date Thu, 06 Oct 2011 13:04:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FrontPage" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/FrontPage?action=diff&rev1=279&rev2=280

Comment:
Refactor the front page, though it's still too long; all it needs is a blink tag to look like
something from 1998

  = Apache Hadoop =
- [[http://hadoop.apache.org/|Apache Hadoop]] is a framework for running applications on large
cluster built of commodity hardware. The Hadoop framework transparently provides applications
both reliability and data motion. Hadoop implements a computational paradigm named [[HadoopMapReduce|Map/Reduce]],
where the application is divided into many small fragments of work, each of which may be executed
or reexecuted on any node in the cluster. In addition, it provides a distributed file system
([[DFS|HDFS]]) that stores data on the compute nodes, providing very high aggregate bandwidth
across the cluster. Both Map/Reduce and the distributed file system are designed so that node
failures are automatically handled by the framework.
+ [[http://hadoop.apache.org/|Apache Hadoop]] is a framework for running applications on large
cluster built of commodity hardware. The Hadoop framework transparently provides applications
both reliability and data motion. Hadoop implements a computational paradigm named [[HadoopMapReduce|Map/Reduce]],
where the application is divided into many small fragments of work, each of which may be executed
or re-executed on any node in the cluster. In addition, it provides a distributed file system
([[DFS|HDFS]]) that stores data on the compute nodes, providing very high aggregate bandwidth
across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so
that node failures are automatically handled by the framework.
  
  == General Information ==
   * [[http://hadoop.apache.org/|Official Apache Hadoop Website]]: download, bug-tracking,
mailing-lists, etc.
- 
   * [[ProjectDescription|Overview]] of Apache Hadoop
+  * [[FAQ]] Frequently Asked Questions.
- 
-  * [[FAQ]] FAQ
- 
   * [[HadoopIsNot|What Hadoop is not]]
- 
   * [[Distributions and Commercial Support]] for Hadoop (RPMs, Debs, AMIs, etc)
- 
   * [[HadoopPresentations|Presentations]], [[Books|books]], [[HadoopArticles|articles]] and
[[Papers|papers]] about Hadoop
- 
-  * PoweredBy, a list of sites and applications powered by Apache Hadoop
+  * PoweredBy, a growing list of sites and applications powered by Apache Hadoop
- 
   * Support
    * [[Help|Getting help from the hadoop community]].
- 
    * [[Support|People and companies for hire]].
- 
   * [[Conferences|Hadoop Community Events and Conferences]]
    * HadoopUserGroups (HUGs)
- 
    * HadoopSummit
- 
    * HadoopWorld
  
-  * [[http://developer.yahoo.com/hadoop/tutorial/|Yahoo! Hadoop Tutorial]]: A thorough tutorial
covering Hadoop setup, HDFS, and [[HadoopMapReduce|MapReduce]]
- 
-  * [[http://www.cloudera.com/hadoop-training-basic|Cloudera Online Hadoop Training]]: Video
lectures, exercises and a pre-configured [[http://www.cloudera.com/hadoop-training-virtual-machine|virtual
machine]] to follow along. Sessions cover [[http://www.cloudera.com/hadoop-training-programming-with-hadoop|Hadoop]],
[[http://www.cloudera.com/hadoop-training-mapreduce-algorithms|MapReduce]], [[http://www.cloudera.com/hadoop-training-hive-introduction|Hive]],
[[http://www.cloudera.com/hadoop-training-pig-introduction|Pig]] and more.
- 
-  * [[http://marakana.com/training/java/hadoop.html|Marakana Hadoop Training]]: 3-day training
program in San Francisco with  [[http://marakana.com/expert/srisatish_ambati,10809.html|Srisatish
Ambati]] Program is geared to give developers hands-on working knowledge for harnessing the
power of Hadoop in their organizations. 
+ === Related-Projects ===
+  * [[HBase]], a Bigtable-like structured storage system for Hadoop HDFS
+  * [[http://wiki.apache.org/pig/|Apache Pig]] is a high-level data-flow language and execution
framework for parallel computation. It is built on top of Hadoop Core.
+  * [[Hive]] a data warehouse infrastructure which allows sql-like adhoc querying of data
(in any format) stored in Hadoop
+  * ZooKeeper is a high-performance coordination service for distributed applications.
+  * [[http://wiki.apache.org/hama|Hama]], a Google's Pregel-like distributed computing framework
based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.
+  * [[http://lucene.apache.org/mahout|Mahout]], scalable Machine Learning algorithms using
Hadoop
  
  == User Documentation ==
   * [[HadoopJavaVersions|Available Java Runtime Environments for Hadoop]]
- 
   * ImportantConcepts
- 
   * GettingStartedWithHadoop (lots of details and explanation)
- 
   * QuickStart (for those who just want it to work ''now'')
- 
-  * [[http://hadoop.apache.org/core/docs/current/commands_manual.html|Command Line Options]]
for hadoop shell script.
+  * [[http://hadoop.apache.org/core/docs/current/commands_manual.html|Command Line Options]]
for the Hadoop shell scripts.
- 
   * [[HadoopOverview|Hadoop Code Overview]]
- 
   * [[TroubleShooting|Troubleshooting]] What do when things go wrong
  
+ === Setting up a Hadoop Cluster ===
   * [[Setup|Setting up a Hadoop Cluster]]
-   * [[Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)]] (tutorial on installing, configuring
and running Hadoop on a single machine)
+  * [[Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)]]
+  * HowToConfigure Hadoop software
+  * [[WebApp_URLs|WebApps for monitoring your system]]
+  * [[NameNodeFailover|How to handle name node failure]]
+  * [[GangliaMetrics|How to get metrics into ganglia]]
+  * [[LargeClusterTips|Tips for managing a large cluster]]
+  * [[DiskSetup|Disk Setup: some suggestions]]
+  * [[PerformanceTuning|Performance:]] getting extra throughput
+  * [[topology_rack_awareness_scripts|Topology Scripts / Rack Awareness]]
  
-   * [[Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)]]
+  * Virtual Clusters including Amazon AWS
+   * [[Virtual Hadoop]] -the theory
+   * How to set up a [[VirtualCluster|Virtual Cluster]]
+   * Running Hadoop on [[AmazonEC2]]
+   * Running Hadoop with AmazonS3
  
-   * [[Virtual Hadoop]]  -how to set up a [[VirtualCluster|Virtual Cluster]]
  
-   * HowToConfigure Hadoop software
+ === Tutorials ===
+  * [[Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)]] A tutorial on installing, configuring
and running Hadoop on a single Ubuntu Linux machine.
+  * [[http://www.cloudera.com/hadoop-training-basic|Cloudera basic training]]
+  * [[http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html|Hadoop Windows/Eclipse Tutorial]]:
How to develop Hadoop with Eclipse on Windows.
+  * [[http://developer.yahoo.com/hadoop/tutorial/|Yahoo! Hadoop Tutorial]]: Hadoop setup,
HDFS, and [[HadoopMapReduce|MapReduce]]
  
+ === MapReduce ===
+ The MapReduce algorithm is the foundational algorithm of Hadoop, and is critical to understand.
-   * [[WebApp_URLs|WebApps for monitoring your system]]
- 
-   * [[NameNodeFailover|How to handle name node failure]]
- 
-   * [[GangliaMetrics|How to get metrics into ganglia]]
- 
-   * [[LargeClusterTips|Tips for managing a large cluster]]
- 
-   * [[DiskSetup|Disk Setup: some suggestions]]
- 
-   * [[PerformanceTuning|Performance:]] getting extra throughput
- 
-   * [[http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html|Hadoop Windows/Eclipse Tutorial]]:
Tutorial on how to setup and configure Hadoop development cluster for Windows and Eclipse.
- 
-   * [[topology_rack_awareness_scripts|Topology Scripts / Rack Awareness]]
- 
-  * Map/Reduce
-   * HadoopMapReduce
+  * HadoopMapReduce
- 
-   * HadoopMapRedClasses
+  * HadoopMapRedClasses
- 
-   * HowManyMapsAndReduces
+  * HowManyMapsAndReduces
- 
-   * TaskExecutionEnvironment
+  * TaskExecutionEnvironment
- 
-   * HowToDebugMapReducePrograms
+  * HowToDebugMapReducePrograms
  
   * Examples
    * WordCount
- 
    * [[PythonWordCount|Python Word Count]]
- 
    * [[C++WordCount|C/C++ Word Count]]
- 
    * [[Grep]]
- 
    * [[Sort]]
- 
    * RandomWriter
- 
    * [[HadoopDfsReadWriteExample|How to read from and write to HDFS]]
- 
-  * Amazon
-   * Running Hadoop on [[AmazonEC2]]
- 
-   * Running Hadoop with AmazonS3
  
   * Benchmarks
    * [[HardwareBenchmarks|Hardware benchmarks]]
- 
    * [[DataProcessingBenchmarks|Data processing benchmarks]]
  
-  * Related-Projects
-   * [[HBase]], a Bigtable-like structured storage system for Hadoop HDFS
  
+  * Contributed parts of the Hadoop codebase
+  These are independent modules that are in the Hadoop codebase but not tightly integrated
with the main project -yet.
-   * [[http://wiki.apache.org/pig/|Apache Pig]] is a high-level data-flow language and execution
framework for parallel computation. It is built on top of Hadoop Core.
- 
-   * [[Hive]] a data warehouse infrastructure which allows sql-like adhoc querying of data
(in any format) stored in Hadoop
- 
-   * ZooKeeper is a high-performance coordination service for distributed applications.
- 
-  * Contrib
    * HadoopStreaming (Useful for using Hadoop with other programming languages)
- 
    * DistributedLucene, a Proposal for a distributed Lucene index in Hadoop
- 
    * [[MountableHDFS]], Fuse-DFS & other Tools to mount HDFS as a standard filesystem
on Linux (and some other Unix OSs)
+   * [[HDFS-APIs]] in Perl, Python, PHP and other languages.
- 
-   * [[HDFS-APIs]] in perl, python, php, etc
- 
    * [[Chukwa]] a data collection, storage, and analysis framework
- 
    * [[EclipsePlugIn|The Apache Hadoop Plugin for Eclipse]] (An Eclipse plug-in that simplifies
the creation and deployment of MapReduce programs with an HDFS Administrative feature)
- 
    * [[HDFS-RAID]] Erasure Coding in HDFS
  
  == Developer Documentation ==
   * [[Roadmap]], listing release plans.
- 
   * HowToContribute
- 
   * HowToDevelopUnitTests
- 
   * HowToUseInjectionFramework
- 
   * HowToUseSystemTestFramework
- 
   * HowToSetupYourDevelopmentEnvironment
- 
   * HowToUseConcurrencyAnalysisTools
- 
   * [[HowToUseJCarder]]
- 
   * [[CodeReviewChecklist|HowToCodeReview]]
- 
   * [[Jira]] usage guidelines
- 
   * HowToCommit
- 
   * HowToRelease
- 
   * HudsonBuildServer
- 
   * HowToSetupUbuntuBuildMachine
- 
   * DevelopmentHints
- 
   * ProjectSuggestions
- 
   * [[HadoopUnderIDEA|Building/Testing under IntelliJ IDEA]]
- 
   * [[GitAndHadoop|Git And Hadoop]]
- 
   * ProjectSplit
  
  == Related Resources ==
   * [[http://wiki.apache.org/nutch/NutchHadoopTutorial|Nutch Hadoop Tutorial]] (Useful for
understanding Hadoop in an application context)
- 
   * [[http://www.alphaworks.ibm.com/tech/mapreducetools|IBM MapReduce Tools for Eclipse]]
- Out of date. Use the Eclipse Plugin in the MapReduce/Contrib instead
- 
   * Hadoop IRC channel is #hadoop at irc.freenode.net.
- 
   * [[http://www.tom-doehler.de/wordpress/index.php/2007/12/19/spring-and-hadoop/|Using Spring
and Hadoop]] (Discussion of possibilities to use Hadoop and Dependency Injection with Spring)
- 
-  * [[http://wiki.apache.org/hama|Hama]], a Google's Pregel-like distributed computing framework
based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.
- 
-  * [[http://lucene.apache.org/mahout|Mahout]], scalable Machine Learning algorithms using
Hadoop
- 
   * [[http://www.wheregridenginelives.com/content/big-data-big-compute-grid-engine-and-hadoop-0|Univa
Grid Engine Integration]] A blog post about the integration of Hadoop with the Grid Engine
successor Univa Grid Engine
- 
   * [[http://philippeadjiman.com/blog/the-hadoop-tutorial-series/|Hadoop Tutorial Series]]
Learning progressively important core Hadoop concepts with hands-on experiments using the
Cloudera Virtual Machine
- 
   * [[http://pydoop.sourceforge.net|Pydoop]] A Python MapReduce and HDFS API for Hadoop.
- 
   * [[https://github.com/klbostee/dumbo/wiki|Dumbo]] Dumbo is a project that allows you to
easily write and run Hadoop programs in Python.
- 
   * [[http://www.asterdata.com/news/091001-Aster-Hadoop-connector.php|Hadoop distributed
file system]] New Hadoop Connector Enables Ultra-Fast Transfer of Data between Hadoop and
Aster Data's MPP Data Warehouse.
- 
   * [[CUDA On Hadoop|Hadoop + CUDA]]
- 
   * [[http://kazman.shidler.hawaii.edu/ArchDoc.html|HDFS Architecture Documentation]] An
overview of the HDFS architecture, intended for contributors.
  
  ----

Mime
View raw message