hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laurent Laborde" <kerdez...@gmail.com>
Subject Re: how to Learning the hadoop
Date Sun, 21 Dec 2008 18:55:55 GMT
On Sat, Dec 20, 2008 at 10:36 AM, samqjf <samqjf@163.com> wrote:
> Ladies and gentlemen, I would like to know what to do hadoop. What software can run on

Hi !
I'm still new to hadoop, but you can do a lot of things with it.

The 2 keys of hadoop are :
- HDFS : A distributed filesystem, something similar to the Google FileSystem.
- MapReduce : Well... there is a lot of paper about mapreduce. (mostly
from Google, they "invented" it).

As long as you have a small (a few MB) or a very large dataset (Many
TB, PB ?) and a problem that can be parallised with MapReduce, Hadoop
may be for you :)

Hadoop have some very interesting Subprojects if you're not planning
to deal directly with MapReduce task.
- HBase : an implementation of Google BigTable.
- Pig : i don't know how to explain it, but i'm crazy about it. it's
really cool. Very usefull for, eg : parsing huge file (log file, text,
csv, ...), filter, group, order, ...
- Hutsh : A search engine ... that use Lucene and hadoop. Something
similar to Google Search ;)
- and more ...

I suggest to read the Google papers about mapreduce, bigtable, GFS.
Papers about hadoop
Paper about the many subprojects of hadoop.

Please correct me if i'm wrong, i'm still newbie here :)
(so far... the most i did was a search engine with around 2 millions
URL indexed).

I'm expecting to use it at work for log parsing. (webiste with millons
of hits/day).
But not yet, i'm just playing with hadoop for now :)

Kerunix Flan
Laurent Laborde

View raw message