hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Davies <m...@mattdavies.net>
Subject Re: Issue: Max block location exceeded for split error when running hive
Date Fri, 20 Sep 2013 03:16:55 GMT
Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <matt@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rjain7@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <harsh@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoctor@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

Mime
View raw message