Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of matt@mattdavies.net
 designates 209.85.223.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADCZBhGD5EE7d+bkTBUHvYi3Rq40JfY+KnT5Jbdv=aPJAaLBjA@mail.gmail.com>
References: 
 <CADkEt-g57vkGmk3UHoBhS7c4=iX0qAqm47eX+kmjTpSAB3MJmg@mail.gmail.com>
	<CAOcnVr0aW9M5FtvytJADU3GeFmwpr5X=Avgha_y87_eVguOwkw@mail.gmail.com>
	<CAENxBwxk5LE0kzXFwXg8QOqmE8sAC=Kb-0xpnn+5xCy0uUiZSQ@mail.gmail.com>
	<CADkEt-hNxBJ0soVaOz10eBaRJ705cXwm0C1AqpaVp6Rqv1Q9hg@mail.gmail.com>
	<CADCZBhF3gy8Onq9osd8W0=4cftmtyyKz_zzvQadOWMsb8UbgcQ@mail.gmail.com>
	<CAEVUn+BdviyVXnDjTzz06KOrtMDSi6ExOW7pjnUZ-fAdkHfoUA@mail.gmail.com>
	<CADCZBhGD5EE7d+bkTBUHvYi3Rq40JfY+KnT5Jbdv=aPJAaLBjA@mail.gmail.com>
Date: Thu, 19 Sep 2013 21:16:55 -0600
Message-ID: 
 <CAEVUn+D2GPBaYgGi4M8X0PpmCuNAfmps8vXdWm+CLVrJA_-Pug@mail.gmail.com>
Subject: Re: Issue: Max block location exceeded for split error when running
 hive
From: Matt Davies <matt@mattdavies.net>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=089e0122f7e04ede0904e6c81a7e

--089e0122f7e04ede0904e6c81a7e
Content-Type: text/plain; charset=ISO-8859-1

Thanks Rahul. Our ops people have implemented the config change.

On Thursday, September 19, 2013, Rahul Jain wrote:

> Matt,
>
> It would be better for you to do an global config update: set *mapreduce.job.max.split.locations
> *to at least the number of datanodes in your cluster, either in
> hive-site.xml or mapred-site.xml. Either case, this is a sensible
> configuration update if you are going to use CombineFileInputFormat to read
> input data in hive.
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <matt@mattdavies.net> wrote:
>
> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rjain7@gmail.com> wrote:
>
> I am assuming you have looked at this already:
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>
> You do have a workaround here to increase *mapreduce.job.max.split.locations
> *value in hive configuration, or do we need more than that here ?
>
> -Rahul
>
>
> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoctor@gmail.com>wrote:
>
> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
> We have this job submit property buried in hive that defaults to 10. We
> should make that configurable.
>
>
> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <harsh@cloudera.com> wrote:
>
> Do your input files carry a replication factor of 10+? That could be
> one cause behind this.
>
> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoctor@gmail.com>
> wrote:
> > Folks,
> >
> > Any one run into this issue before:
> > java.io.IOException: Max block location exceeded for split: Paths:
> > "/foo/bar...."
> > ....
> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> > splitsize: 15 maxsize: 10
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
> > at
> >
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
> > at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> > at org.apac
>
>

--089e0122f7e04ede0904e6c81a7e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Rahul. Our ops people have implemented the config change.<br><br>On =
Thursday, September 19, 2013, Rahul Jain  wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">
<div dir=3D"ltr">Matt,<div><br></div><div>It would be better for you to do =
an global config update: set=A0<b style=3D"color:rgb(51,51,51);font-family:=
Arial,sans-serif;font-size:14px;line-height:20px">mapreduce.job.max.split.l=
ocations </b><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-seri=
f;font-size:14px;line-height:20px">to at least the number of datanodes in y=
our cluster, either in hive-site.xml or mapred-site.xml. Either case, this =
is a sensible configuration update if you are going to use CombineFileInput=
Format to read input data in hive.</span></div>

<div><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-serif;font-s=
ize:14px;line-height:20px"><br></span></div><div><span style=3D"color:rgb(5=
1,51,51);font-family:Arial,sans-serif;font-size:14px;line-height:20px">-Rah=
ul</span></div>

</div><div><br><br><div>On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <span =
dir=3D"ltr">&lt;<a>matt@mattdavies.net</a>&gt;</span> wrote:<br>
<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">What are the ramifications of setting a hard code=
d value in our scripts and then changing parameters which influence the inp=
ut data size. I.e. I want to run across 1 day worth of data, then a differe=
nt day I want to run against 30 days?<div>


<br></div><div><br></div></div><div><div><div><br><br><div>On Thu, Sep 19, =
2013 at 3:11 PM, Rahul Jain <span dir=3D"ltr">&lt;<a>rjain7@gmail.com</a>&g=
t;</span> wrote:<br>

<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">I am assuming you have looked at this already:<di=
v><br></div><div><a href=3D"https://issues.apache.org/jira/browse/MAPREDUCE=
-5186" target=3D"_blank">https://issues.apache.org/jira/browse/MAPREDUCE-51=
86</a><br>


</div><div><br></div>
<div>You do have a workaround here to increase=A0<b style=3D"color:rgb(51,5=
1,51);font-family:Arial,sans-serif;font-size:14px;line-height:20px">mapredu=
ce.job.max.split.locations </b><span style=3D"color:rgb(51,51,51);font-fami=
ly:Arial,sans-serif;font-size:14px;line-height:20px">value in hive configur=
ation, or do we need more than that here ?</span></div>


<span><font color=3D"#888888">
<div><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-serif;font-s=
ize:14px;line-height:20px"><br></span></div><div><span style=3D"color:rgb(5=
1,51,51);font-family:Arial,sans-serif;font-size:14px;line-height:20px">-Rah=
ul=A0</span></div>


</font></span></div><div><div><div><br><br><div>On Thu, Sep 19, 2013 at 11:=
00 AM, Murtaza Doctor <span dir=3D"ltr">&lt;<a>murtazadoctor@gmail.com</a>&=
gt;</span> wrote:<br>


<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">It used to throw a warning in 1.03 and now has be=
come an IOException. I was more trying to figure out why it is exceeding th=
e limit even though the replication factor is 3. Also Hive may use CombineI=
nputSplit or some version of it, are we saying it will always exceed the li=
mit of 10?</div>


<div><div>
<div><br><br><div>On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <span d=
ir=3D"ltr">&lt;<a>edlinuxguru@gmail.com</a>&gt;</span> wrote:<br>


<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div dir=3D"ltr">We have this job submit property buried in hive t=
hat defaults to 10. We should make that configurable. <br>


</div><div><br><br><div><div>On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <span=
 dir=3D"ltr">&lt;<a>harsh@cloudera.com</a>&gt;</span> wrote:<br>


</div><div><div><blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">Do your input files carry a replication factor of =
10+? That could be<br>
one cause behind this.<br>
<br>
On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor &lt;<a>murtazadoctor@gmail.=
com</a>&gt; wrote:<br>
&gt; Folks,<br>
&gt;<br>
&gt; Any one run into this issue before:<br>
&gt; java.io.IOException: Max block location exceeded for split: Paths:<br>
&gt; &quot;/foo/bar....&quot;<br>
&gt; ....<br>
&gt; InputFormatClass: org.apache.hadoop.mapred.TextInputFormat<br>
&gt; splitsize: 15 maxsize: 10<br>
&gt; at<br>
&gt; org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSpl=
itWriter.java:162)<br>
&gt; at<br>
&gt; org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobS=
plitWriter.java:87)<br>
&gt; at<br>
&gt; org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.j=
ava:501)<br>
&gt; at<br>
&gt; org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java=
:471)<br>
&gt; at<br>
&gt; org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitte=
r.java:366)<br>
&gt; at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)<br>
&gt; at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)<br>
&gt; at java.security.AccessController.doPrivileged(Native Method)<br>
&gt; at javax.security.auth.Subject.doAs(Subject.java:415)<br>
&gt; at<br>
&gt; org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat=
ion.java:1408)<br>
&gt; at org.apac</blockquote></div></div></div></div></blockquote></div></d=
iv></div></div></blockquote></div></div></div></div></blockquote></div></di=
v></div></div></blockquote></div></div></blockquote>

--089e0122f7e04ede0904e6c81a7e--