hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Is it possible to write file output in Map phase once and write another file output in Reduce phase?
Date Fri, 10 Dec 2010 16:15:06 GMT
That is definitely possible, but may not be very desirable.

Take a look at the Bixo project for a full-scale crawler.  There is a lot of
subtlety in the fetching of URL's
due to the varying quality of different sites and the interaction with crawl
choking due to robots.txt considerations.


On Thu, Dec 9, 2010 at 11:27 PM, edward choi <mp2893@gmail.com> wrote:

> So my design is:
> Map phase ==> crawl news articles, process text, write the result to a
> file.
>        II
>        II     pass (term, term_frequency) pair to the Reducer
>        II
>        V
> Reduce phase ==> Merge the (term, term_frequency) pair and create a
> dictionary
> Is this at all possible? Or is it inherently impossible due to the
> structure
> of Hadoop?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message