hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Murtaza Doctor <murtazadoc...@gmail.com>
Subject Re: Issue: Max block location exceeded for split error when running hive
Date Thu, 19 Sep 2013 18:00:37 GMT
It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <harsh@cloudera.com> wrote:
>
>> Do your input files carry a replication factor of 10+? That could be
>> one cause behind this.
>>
>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoctor@gmail.com>
>> wrote:
>> > Folks,
>> >
>> > Any one run into this issue before:
>> > java.io.IOException: Max block location exceeded for split: Paths:
>> > "/foo/bar...."
>> > ....
>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>> > splitsize: 15 maxsize: 10
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>> > at
>> >
>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>> > at
>> >
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>> > at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>> > at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>> >
>> > When we set the property to something higher as suggested like:
>> > mapreduce.job.max.split.locations = more than on what it failed
>> > then the job runs successfully.
>> >
>> > I am trying to dig up additional documentation on this since the default
>> > seems to be 10, not sure how that limit was set.
>> > Additionally what is the recommended value and what factors does it
>> depend
>> > on?
>> >
>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>> version
>> > 0.10
>> >
>> > Any pointers in this direction will be helpful.
>> >
>> > Regards,
>> > md
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Mime
View raw message