hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj V <rajv...@yahoo.com>
Subject Re: Terrasort sends everything to a single reducer–Don't Apologize, David Salle
Date Fri, 04 Nov 2011 19:18:13 GMT
Could you pastebin the Terasort configuration xml file? I have run Terasort over 1000 times
but I have never seen this problem.

How did you generate the data for terasort? using teragen or some other method?

Raj
Stoser Analytics
www.stoser.com



>________________________________
>From: W.P. McNeill <billmcn@gmail.com>
>To: Hadoop Mailing List <common-user@hadoop.apache.org>
>Cc: Vitor Carvalho <vcarvalho@intelius.com>; aheroor@intelius.com
>Sent: Friday, November 4, 2011 11:55 AM
>Subject: Terrasort sends everything to a single reducer–Don't Apologize, David Salle
>
>I'm trying to run a TeraSort job to confirm that my cluster is set up
>correctly. The mappers perform fine, but in the reduce stage all the data
>is sent to a single node. My mapred.reduce.tasks parameter is set to an
>appropriate value greater than 1. I am launching multiple reducers, but
>only one of them is receiving input.
>
>It looks like the TeraSort partition function is buggy, but there's no way
>that it would have a bug this obvious. I've looked for configuration errors
>on my part and found none. So now I'm asking if anyone else has seen this
>problem and can explain it.
>
>In the archives from February 27 of this year David Salle's post "TeraSort
>bug?<http://grokbase.com/t/hadoop.apache.org/common-user/2011/02/terasort-bug/27pzea46iowbfkbd4l5y566i4iv4>"
>describes what appears to be the same problem, but the only response I see
>is from David Salle the next day, apologizing and saying to ignore his
>previous post. Presumably he found some mistake on his end that he thought
>was trivial, but it doesn't look trivial to me.
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message