Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D9B3010882 for ; Fri, 20 Sep 2013 03:17:36 +0000 (UTC) Received: (qmail 39092 invoked by uid 500); 20 Sep 2013 03:17:24 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 38988 invoked by uid 500); 20 Sep 2013 03:17:24 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 38951 invoked by uid 99); 20 Sep 2013 03:17:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 03:17:21 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matt@mattdavies.net designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Sep 2013 03:17:16 +0000 Received: by mail-ie0-f176.google.com with SMTP id as1so17302003iec.35 for ; Thu, 19 Sep 2013 20:16:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=w1YOuKn4EacNuM2ElM9eODbFa6IrZfkHqWwAABBLeo8=; b=QK75PJ9MVE+7fjBIldFpuUtkMFL4WILPfsDRlvWmf3b45TfyV55SXCP0artG+QMgBI mI6L6mB8tORl87jZoQGeRSSnixaWctZn6Og9Wy3g3QwKj6Lym1wxVJ57TU5/MmO3ZfsA zHJw3ILrnLgEojCJleEVKQRuzS9kghC6UYx+DuV4svmPjHT+m0huLq028xTOdxiaM0RO 5HQIEMcnYBAbMw9k7c4hiO/BvHQiTq1pCpu21uxrkSMuBbKhKNksilkC2E5//H+VZIEY jnjJLgQThHHoT6oUIPOZa/ZenQmfY3MwayCCcpg4WhRKK81xZ801mmtX3wsqXPENeMNG EDjw== X-Gm-Message-State: ALoCoQlO+TEbBM6jE8Y2ovHASLAyil3f+Mz06GQlxq+bWIyrkebCJfNx3TyoJm0DLhwbcOfuAmPv MIME-Version: 1.0 X-Received: by 10.50.2.67 with SMTP id 3mr893770igs.41.1379647016059; Thu, 19 Sep 2013 20:16:56 -0700 (PDT) Received: by 10.64.226.233 with HTTP; Thu, 19 Sep 2013 20:16:55 -0700 (PDT) X-Originating-IP: [67.182.206.152] In-Reply-To: References: Date: Thu, 19 Sep 2013 21:16:55 -0600 Message-ID: Subject: Re: Issue: Max block location exceeded for split error when running hive From: Matt Davies To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=089e0122f7e04ede0904e6c81a7e X-Virus-Checked: Checked by ClamAV on apache.org --089e0122f7e04ede0904e6c81a7e Content-Type: text/plain; charset=ISO-8859-1 Thanks Rahul. Our ops people have implemented the config change. On Thursday, September 19, 2013, Rahul Jain wrote: > Matt, > > It would be better for you to do an global config update: set *mapreduce.job.max.split.locations > *to at least the number of datanodes in your cluster, either in > hive-site.xml or mapred-site.xml. Either case, this is a sensible > configuration update if you are going to use CombineFileInputFormat to read > input data in hive. > > -Rahul > > > On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies wrote: > > What are the ramifications of setting a hard coded value in our scripts > and then changing parameters which influence the input data size. I.e. I > want to run across 1 day worth of data, then a different day I want to run > against 30 days? > > > > > On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain wrote: > > I am assuming you have looked at this already: > > https://issues.apache.org/jira/browse/MAPREDUCE-5186 > > You do have a workaround here to increase *mapreduce.job.max.split.locations > *value in hive configuration, or do we need more than that here ? > > -Rahul > > > On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor wrote: > > It used to throw a warning in 1.03 and now has become an IOException. I > was more trying to figure out why it is exceeding the limit even though the > replication factor is 3. Also Hive may use CombineInputSplit or some > version of it, are we saying it will always exceed the limit of 10? > > > On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo wrote: > > We have this job submit property buried in hive that defaults to 10. We > should make that configurable. > > > On Wed, Sep 18, 2013 at 9:34 PM, Harsh J wrote: > > Do your input files carry a replication factor of 10+? That could be > one cause behind this. > > On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor > wrote: > > Folks, > > > > Any one run into this issue before: > > java.io.IOException: Max block location exceeded for split: Paths: > > "/foo/bar...." > > .... > > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat > > splitsize: 15 maxsize: 10 > > at > > > org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162) > > at > > > org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) > > at > > > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) > > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269) > > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > > at org.apac > > --089e0122f7e04ede0904e6c81a7e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Rahul. Our ops people have implemented the config change.

On = Thursday, September 19, 2013, Rahul Jain wrote:
Matt,

It would be better for you to do = an global config update: set=A0mapreduce.job.max.split.l= ocations to at least the number of datanodes in y= our cluster, either in hive-site.xml or mapred-site.xml. Either case, this = is a sensible configuration update if you are going to use CombineFileInput= Format to read input data in hive.

-Rah= ul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <matt@mattdavies.net> wrote:
What are the ramifications of setting a hard code= d value in our scripts and then changing parameters which influence the inp= ut data size. I.e. I want to run across 1 day worth of data, then a differe= nt day I want to run against 30 days?




On Thu, Sep 19, = 2013 at 3:11 PM, Rahul Jain <rjain7@gmail.com&g= t; wrote:
I am assuming you have looked at this already:

You do have a workaround here to increase=A0mapredu= ce.job.max.split.locations value in hive configur= ation, or do we need more than that here ?

-Rah= ul=A0


On Thu, Sep 19, 2013 at 11:= 00 AM, Murtaza Doctor <murtazadoctor@gmail.com&= gt; wrote:
It used to throw a warning in 1.03 and now has be= come an IOException. I was more trying to figure out why it is exceeding th= e limit even though the replication factor is 3. Also Hive may use CombineI= nputSplit or some version of it, are we saying it will always exceed the li= mit of 10?


On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
We have this job submit property buried in hive t= hat defaults to 10. We should make that configurable.


On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <harsh@cloudera.com> wrote:
Do your input files carry a replication factor of = 10+? That could be
one cause behind this.

On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoctor@gmail.= com> wrote:
> Folks,
>
> Any one run into this issue before:
> java.io.IOException: Max block location exceeded for split: Paths:
> "/foo/bar...."
> ....
> InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
> splitsize: 15 maxsize: 10
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSpl= itWriter.java:162)
> at
> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobS= plitWriter.java:87)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.j= ava:501)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java= :471)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitte= r.java:366)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1408)
> at org.apac
--089e0122f7e04ede0904e6c81a7e--