hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?
Date Fri, 10 Dec 2010 16:15:06 GMT
That is definitely possible, but may not be very desirable.

Take a look at the Bixo project for a full-scale crawler.  There is a lot of
subtlety in the fetching of URL's
due to the varying quality of different sites and the interaction with crawl
choking due to robots.txt considerations.

http://bixo.101tec.com/

On Thu, Dec 9, 2010 at 11:27 PM, edward choi <mp2893@gmail.com> wrote:

> So my design is:
> Map phase ==> crawl news articles, process text, write the result to a
> file.
>        II
>        II     pass (term, term_frequency) pair to the Reducer
>        II
>        V
> Reduce phase ==> Merge the (term, term_frequency) pair and create a
> dictionary
>
> Is this at all possible? Or is it inherently impossible due to the
> structure
> of Hadoop?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message