hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naama Kraus" <naamakr...@gmail.com>
Subject Re: Nutch Extensions to MapReduce
Date Thu, 06 Mar 2008 13:22:57 GMT
Well, I was not actually thinking to use Nutch.
To be concrete, I was interested if a MapReduce job could output multiple
files each holds different <key,value> pairs. I got the impression this is
done in Nutch from slide 15 of
http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf
but maybe I was mis-understanding.
Is it Nutch specific or achievable using Hadoop API ? Would multiple
different reducers do the trick ?

Thanks for offering to help, I might have more concrete details of what I am
trying to implement later on, now I am basically learning.

Naama

On Thu, Mar 6, 2008 at 3:13 PM, Enis Soztutar <enis.soz.nutch@gmail.com>
wrote:

> Hi,
>
> Currently nutch is a fairly complex application that *uses* hadoop as a
> base for distributed computing and storage. In this regard there is no
> part in nutch that "extends" hadoop. The core of the mapreduce indeed
> does work with <key,value> pairs, and nutch uses specific <key,value>
> pairs such as <url, CrawlDatum>, etc.
>
> So long story short, it depends on what you want to build. If you
> working on something that is not related to nutch, you do not need it.
> You can give further info about your project if you want extended help.
>
> best wishes.
> Enis
>
> Naama Kraus wrote:
> > Hi,
> >
> > I've seen in
> >
> http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf(slide<http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf%28slide>
> > 12) that Nutch has extensions to MapReduce. I wanted to ask whether
> > these are part of the Hadoop API or inside Nutch only.
> >
> > More specifically, I saw in
> >
> http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf(slide<http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/yahoo-sds.pdf%28slide>
> > 15) that MapReduce outputs two files each holds different <key,value>
> > pairs. I'd be curious to know if I can achieve that using the standard
> API.
> >
> > Thanks, Naama
> >
> >
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message