Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates
 209.85.214.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADkEt-jQsfNprAcz4yjnx1tUwdDa4ktT5ydgdwDQVG6mHYNSBw@mail.gmail.com>
References: 
 <CADkEt-g57vkGmk3UHoBhS7c4=iX0qAqm47eX+kmjTpSAB3MJmg@mail.gmail.com>
 <CAOcnVr0aW9M5FtvytJADU3GeFmwpr5X=Avgha_y87_eVguOwkw@mail.gmail.com>
 <CADkEt-jQsfNprAcz4yjnx1tUwdDa4ktT5ydgdwDQVG6mHYNSBw@mail.gmail.com>
From: Harsh J <harsh@cloudera.com>
Date: Thu, 19 Sep 2013 14:03:46 +0530
Message-ID: 
 <CAOcnVr12dAApZGaqa9hwwnOAcbBUKh6_yn7r4A5tEUM5V6yweg@mail.gmail.com>
Subject: Re: Issue: Max block location exceeded for split error when running
 hive
To: "<user@hadoop.apache.org>" <user@hadoop.apache.org>
Content-Type: text/plain; charset=ISO-8859-1

Are you using a CombineFileInputFormat or similar input format then, perhaps?

On Thu, Sep 19, 2013 at 1:29 PM, Murtaza Doctor <murtazadoctor@gmail.com> wrote:
> We are using the default replication factor of 3.  When new files are put on
> HDFS we never override the replication factor. When there is more data
> involved it fails on a larger split size.
>
>
> On Wed, Sep 18, 2013 at 6:34 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoctor@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> > depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> > version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>
>


-- 
Harsh J