hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Nutch Extensions to MapReduce
Date Thu, 06 Mar 2008 13:13:52 GMT

Currently nutch is a fairly complex application that *uses* hadoop as a 
base for distributed computing and storage. In this regard there is no 
part in nutch that "extends" hadoop. The core of the mapreduce indeed 
does work with <key,value> pairs, and nutch uses specific <key,value> 
pairs such as <url, CrawlDatum>, etc.

So long story short, it depends on what you want to build. If you 
working on something that is not related to nutch, you do not need it. 
You can give further info about your project if you want extended help.

best wishes.

Naama Kraus wrote:
> Hi,
> I've seen in
> http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf(slide
> 12) that Nutch has extensions to MapReduce. I wanted to ask whether
> these are part of the Hadoop API or inside Nutch only.
> More specifically, I saw in
> http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf(slide
> 15) that MapReduce outputs two files each holds different <key,value>
> pairs. I'd be curious to know if I can achieve that using the standard API.
> Thanks, Naama

View raw message