hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HadoopOverview" by MikeCafarella
Date Sun, 23 Apr 2006 05:51:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by MikeCafarella:

New page:
= Overview of Hadoop =

Hadoop is a collection of code libraries and programs useful for creating
very large distributed systems.  Much of the code was originally part of the 
Nutch search engine project.

Hadoop includes the following parts:
 * ["conf"], an assortment of classes for handling key-value pairs used in system configuration.
 * ["DFS"], the Hadoop Distributed Filesystem.
 * ["io"], an assortment of IO-related classes.  Includes a compressed UTF8 string implementation,
code for performing external sorts, and a "poor-man's B-Tree" implementation for looking up
items in large key-value sets.
 * ["ipc"], a fast and easy remote procedure call system
 * HadoopMapReduce, a distributed job allocation system built on top of DFS.  It employs a
[http://labs.google.com/papers/mapreduce.html MapReduce]-like programming model

View raw message