hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: Patch
Date Fri, 13 Jun 2008 08:04:06 GMT
Sorry, for replying the private email to the mailing list, but I strongly 
believe in leaving the next guy something to google ;)

Anyway, as you seem to be knowledgeable about sorting, one question:

Does hadoop provide all key/value tuples for a given key in one batch to the 
reducer, or not?

TIA,

Andreas

On Friday 13 June 2008 02:48:52 you wrote:
> Great deal; thanks for sending it to me.
>
> This has exactly the same pattern described in the JIRA
> (HADOOP-3442); the partition that fails is nearly sorted and it's
> selected one of the largest values as its pivot.
>
> The fix is checked into the 0.17 branch; if you check it out and
> deploy it, your jobs should finish without causing the
> StackOverflowError. If you're noticing inordinately long sort times
> for your job (i.e. this is a common pattern for your data), then you
> might consider applying HADOOP-3308 and HADOOP-3442 (the former so
> the latter applies cleanly). Really sorry you hit this; let me know
> if the sort times with the 0.17.1 branch are inordinately long, so
> this can get another iteration if it needs it. -C

Mime
View raw message