hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: Question about Hadoop-8192 and rackToBlocks ordering
Date Thu, 22 Mar 2012 21:40:45 GMT

On Mar 22, 2012, at 11:45 AM, Amir Sanjar wrote:

> Thanks for the reply Robert,
> However I believe the main design issue is:
> If there is a rack ( listed in rackToBlock hashMap) that contains all the
> blocks (stored in blockToNode hashMap), regardless of the order, the split
> operation terminates after the rack gets processed,  That means remaining
> racks  ( listed in rackToBlock hashMap)  will not get processed . For more
> details look at file CombineFileInputFormat.JAVA, method getMoreSplits(),
> while loop starting at  line 344.

I haven't looked at the code much yet. But trying to understand your question - what issue
are you trying to bring out? Is it overloading one task with too much input (there is a min/max
limit on that one though)?

> Best Regards
> Amir Sanjar
> Linux System Management Architect and Lead
> IBM Senior Software Engineer
> Phone# 512-286-8393
> Fax#      512-838-8858
> From:	Robert Evans <evans@yahoo-inc.com>
> To:	"common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>
> Date:	03/22/2012 11:57 AM
> Subject:	Re: Question about Hadoop-8192 and rackToBlocks ordering
> If it really is the ordering of the hash map I would say no it should not,
> and the code should be updated.  If ordering matters we need to use a map
> that guarantees a given order, and hash map is not one of them.
> --Bobby Evans
> On 3/22/12 7:24 AM, "Kumar Ravi" <gokumarravi@gmail.com> wrote:
> Hello,
> We have been looking at IBM JDK junit failures on Hadoop-1.0.1
> independently and have ran into the same failures as reported in this JIRA.
> I have a question based upon what I have observed below.
> We started debugging the problems in the testcase -
> org.apache.hadoop.mapred.lib.TestCombineFileInputFormat
> The testcase fails because the number of splits returned back from
> CombineFileInputFormat.getSplits() is 1 when using IBM JDK whereas the
> expected return value is 2.
> So far, we have found the reason for this difference in number of splits is
> because the order in which elements in the rackToBlocks hashmap get created
> is in the reverse order that Sun JDK creates.
> The question I have at this point is -- Should there be a strict dependency
> in the order in which the rackToBlocks hashmap gets populated, to determine
> the number of splits that get should get created in a hadoop cluster? Is
> this Working as designed?
> Regards,
> Kumar

View raw message