hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zehua <bradjo...@yahoo.com>
Subject Duplicate Input and duplicate result
Date Mon, 08 Dec 2008 22:57:59 GMT

We use the Hadoop and Nutch to crawl the website. We grab the URL list from
some SQL server and split them among the cluster. When we increase the
number of mapper, the number of duplicate results increase. For example, if
the number of mapper is 2, the record maybe replicated by 2. When there are
8 instance, the result is duplicate 8 times. Any idea about this? Where can
be the problem?
View this message in context: http://www.nabble.com/Duplicate-Input-and-duplicate-result-tp20905297p20905297.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

View raw message