Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of Daniel.Harper@bbc.co.uk
 designates 132.185.240.35 as permitted sender)
From: Daniel Harper <Daniel.Harper@bbc.co.uk>
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: [Hive 0.13.1] - Explanation/confusion over "Fatal error occurred
 when node tried to create too many dynamic partitions" on small dataset with
 dynamic partitions
Thread-Topic: [Hive 0.13.1] - Explanation/confusion over "Fatal error
 occurred when node tried to create too many dynamic partitions" on small
 dataset with dynamic partitions
Thread-Index: AQHQd5Kx+IzaP1dFFUm65e3xaBvURQ==
Date: Wed, 15 Apr 2015 15:41:52 +0000
Message-ID: <D154454F.41D2%Daniel.Harper@bbc.co.uk>
Accept-Language: en-GB, en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_D154454F41D2DanielHarperbbccouk_"
MIME-Version: 1.0

--_000_D154454F41D2DanielHarperbbccouk_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Hi there,

We've been encountering the exception

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveF=
atalException: [Error 20004]: Fatal error occurred when node tried to creat=
e too many dynamic partitions. The maximum number of dynamic partitions is =
controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.pa=
rtitions.pernode. Maximum was set to: 100

On a very small dataset (180 lines) using the following setup

CREATE TABLE enriched_data (
enriched_json_data string
)
PARTITIONED BY (yyyy string, mm string, dd string, identifier string, sub_i=
dentifier string, unique_run_id string)
CLUSTERED BY (enriched_json_data) INTO 128 BUCKETS
LOCATION "${OUTDIR}";

INSERT OVERWRITE TABLE enriched_data PARTITION (yyyy, mm, dd, identifier, s=
ub_identifier, unique_run_id)
SELECT =85

We=92ve not seen this issue before (normally our dataset is billions of lin=
es), but in this case we have a very tiny amount of data causing this issue=
.

After looking at the code, it appears as if this condition is failing https=
://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hi=
ve/ql/exec/FileSinkOperator.java#L745
I downloaded and rebuilt the branch with a bit of debugging/stdout printing=
 on the contents of the valToPaths map and it fails as there are 101 entrie=
s in it

All the entries look like this

yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id=
=3Ddf-345345/000047_0
yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id=
=3Ddf-345345/000048_0
yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id=
=3Ddf-345345/000049_0
yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id=
=3Ddf-345345/000051_0
=85.

We=92re just confused as to why Hive considers the final bit of the output =
path (e.g. 000047_0) to be a =93dynamic partition=94, as this is not in our=
 PARTITIONED BY clause

The only thing I can think of is the CLUSTERED BY 128 BUCKETS clause, combi=
ned with the dataset being really small (180 lines), is loading everything =
into 1 REDUCER task =96 but the hashing of each line is distributing the ro=
ws fairly uniformly so we have > 100 buckets to write to via one reducer

Any help will be greatly appreciated

With thanks,

Daniel Harper
Software Engineer, OTG ANT
BC5 A5

--_000_D154454F41D2DanielHarperbbccouk_
Content-Type: text/html; charset="Windows-1252"
Content-ID: <6BD5F9444B6AEC40AC7A04B66FE11261@bbc.co.uk>
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
</head>
<body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-lin=
e-break: after-white-space; color: rgb(0, 0, 0); font-size: 14px; font-fami=
ly: Calibri, sans-serif;">
<div>
<div style=3D"font-family: Calibri, sans-serif;">Hi there,</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">We've been encountering th=
e exception</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<blockquote style=3D"font-family: Calibri; margin: 0px 0px 0px 40px; border=
: none; padding: 0px;">
<font face=3D"Consolas">Error: java.lang.RuntimeException: org.apache.hadoo=
p.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred =
when node tried to create too many dynamic partitions. The maximum number o=
f dynamic partitions is controlled
 by&nbsp;hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partiti=
ons.pernode. Maximum was set to: 100</font></blockquote>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">On a very small dataset (1=
80 lines) using the following setup</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<blockquote style=3D"font-family: Calibri; margin: 0px 0px 0px 40px; border=
: none; padding: 0px;">
<div><font face=3D"Consolas">CREATE TABLE enriched_data (</font></div>
<div><font face=3D"Consolas"><span class=3D"Apple-tab-span" style=3D"white-=
space: pre;"></span>enriched_json_data string</font></div>
<div><font face=3D"Consolas">)</font></div>
<div><font face=3D"Consolas">PARTITIONED BY (yyyy string, mm string, dd str=
ing, identifier string, sub_identifier string, unique_run_id string)</font>=
</div>
<div><font face=3D"Consolas">CLUSTERED BY (enriched_json_data) INTO 128 BUC=
KETS</font></div>
<div><font face=3D"Consolas">LOCATION &quot;${OUTDIR}&quot;;&nbsp;</font></=
div>
</blockquote>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div>
<blockquote style=3D"font-family: Calibri; margin: 0px 0px 0px 40px; border=
: none; padding: 0px;">
<div><font face=3D"Consolas">INSERT OVERWRITE TABLE enriched_data PARTITION=
 (yyyy, mm, dd, identifier, sub_identifier, unique_run_id)</font></div>
<div><font face=3D"Consolas">SELECT =85</font></div>
</blockquote>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">We=92ve not seen this issu=
e before (normally our dataset is billions of lines), but in this case we h=
ave a very tiny amount of data causing this issue.</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">After looking at the code,=
 it appears as if this condition is failing&nbsp;<a href=3D"https://github.=
com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec=
/FileSinkOperator.java#L745">https://github.com/apache/hive/blob/branch-0.1=
3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L745</a>=
</div>
<div><span style=3D"font-family: Calibri, sans-serif;">I downloaded and reb=
uilt the branch with a bit of debugging/stdout printing on the contents of =
the
</span><font face=3D"Consolas">valToPaths</font><font face=3D"Calibri,sans-=
serif"> map and it fails as there are 101 entries in it</font></div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">All the entries look like =
this</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">yyyy=3D2015/mm=3D04/dd=3D0=
9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000047_0</div=
>
<div style=3D"font-family: Calibri, sans-serif;">yyyy=3D2015/mm=3D04/dd=3D0=
9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000048_0</div=
>
<div style=3D"font-family: Calibri, sans-serif;">yyyy=3D2015/mm=3D04/dd=3D0=
9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000049_0</div=
>
<div style=3D"font-family: Calibri, sans-serif;">yyyy=3D2015/mm=3D04/dd=3D0=
9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000051_0</div=
>
<div style=3D"font-family: Calibri, sans-serif;">=85.</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div>
<div><font face=3D"Calibri,sans-serif">We=92re just confused as to why Hive=
 considers the final bit of the output path (e.g. 000047_0) to be a =93dyna=
mic partition=94, as this is not in our
</font><font face=3D"Consolas">PARTITIONED BY</font><font face=3D"Calibri,s=
ans-serif"> clause</font></div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">The only thing I can think=
 of is the CLUSTERED BY 128 BUCKETS clause, combined with the dataset being=
 really small (180 lines), is loading everything into 1 REDUCER task =96 bu=
t the hashing of each line is distributing
 the rows fairly uniformly so we have &gt; 100 buckets to write to via one =
reducer</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
<div style=3D"font-family: Calibri, sans-serif;">Any help will be greatly a=
ppreciated</div>
</div>
<div style=3D"font-family: Calibri, sans-serif;"><br>
</div>
</div>
</div>
<div style=3D"font-family: Calibri, sans-serif;">
<div><font class=3D"Apple-style-span" color=3D"#000000"><font class=3D"Appl=
e-style-span" face=3D"Calibri">With thanks,</font></font></div>
<div><font class=3D"Apple-style-span" color=3D"#000000"><font class=3D"Appl=
e-style-span" face=3D"Calibri"><br>
</font></font></div>
<div><font class=3D"Apple-style-span" color=3D"#000000"><font class=3D"Appl=
e-style-span" face=3D"Calibri">Daniel Harper</font></font></div>
<div><font class=3D"Apple-style-span" size=3D"2" color=3D"#c0c0c0">Software=
 Engineer, OTG ANT</font></div>
<div><font class=3D"Apple-style-span" size=3D"2" color=3D"#c0c0c0">BC5 A5</=
font></div>
</div>
</body>
</html>

--_000_D154454F41D2DanielHarperbbccouk_--