Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A480717D88 for ; Wed, 15 Apr 2015 15:49:49 +0000 (UTC) Received: (qmail 26658 invoked by uid 500); 15 Apr 2015 15:49:48 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 26591 invoked by uid 500); 15 Apr 2015 15:49:47 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 26581 invoked by uid 99); 15 Apr 2015 15:49:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2015 15:49:47 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Daniel.Harper@bbc.co.uk designates 132.185.240.35 as permitted sender) Received: from [132.185.240.35] (HELO mailout0.thls.bbc.co.uk) (132.185.240.35) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2015 15:49:43 +0000 Received: from BGB01XI1012.national.core.bbc.co.uk (bgb01xi1012.national.core.bbc.co.uk [10.161.14.16]) by mailout0.thls.bbc.co.uk (8.14.4/8.14.3) with ESMTP id t3FFfqsm008005 for ; Wed, 15 Apr 2015 16:41:52 +0100 (BST) Received: from BGB01XUD1011.national.core.bbc.co.uk ([169.254.10.50]) by BGB01XI1012.national.core.bbc.co.uk ([10.161.14.16]) with mapi id 14.03.0195.001; Wed, 15 Apr 2015 16:41:52 +0100 From: Daniel Harper To: "user@hive.apache.org" Subject: [Hive 0.13.1] - Explanation/confusion over "Fatal error occurred when node tried to create too many dynamic partitions" on small dataset with dynamic partitions Thread-Topic: [Hive 0.13.1] - Explanation/confusion over "Fatal error occurred when node tried to create too many dynamic partitions" on small dataset with dynamic partitions Thread-Index: AQHQd5Kx+IzaP1dFFUm65e3xaBvURQ== Date: Wed, 15 Apr 2015 15:41:52 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.10.48.249] x-exclaimer-md-config: c91d45b2-6e10-4209-9543-d9970fac71b7 Content-Type: multipart/alternative; boundary="_000_D154454F41D2DanielHarperbbccouk_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_D154454F41D2DanielHarperbbccouk_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi there, We've been encountering the exception Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveF= atalException: [Error 20004]: Fatal error occurred when node tried to creat= e too many dynamic partitions. The maximum number of dynamic partitions is = controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.pa= rtitions.pernode. Maximum was set to: 100 On a very small dataset (180 lines) using the following setup CREATE TABLE enriched_data ( enriched_json_data string ) PARTITIONED BY (yyyy string, mm string, dd string, identifier string, sub_i= dentifier string, unique_run_id string) CLUSTERED BY (enriched_json_data) INTO 128 BUCKETS LOCATION "${OUTDIR}"; INSERT OVERWRITE TABLE enriched_data PARTITION (yyyy, mm, dd, identifier, s= ub_identifier, unique_run_id) SELECT =85 We=92ve not seen this issue before (normally our dataset is billions of lin= es), but in this case we have a very tiny amount of data causing this issue= . After looking at the code, it appears as if this condition is failing https= ://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hi= ve/ql/exec/FileSinkOperator.java#L745 I downloaded and rebuilt the branch with a bit of debugging/stdout printing= on the contents of the valToPaths map and it fails as there are 101 entrie= s in it All the entries look like this yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id= =3Ddf-345345/000047_0 yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id= =3Ddf-345345/000048_0 yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id= =3Ddf-345345/000049_0 yyyy=3D2015/mm=3D04/dd=3D09/identifier=3D1/sub-identifier=3D3/unique_run_id= =3Ddf-345345/000051_0 =85. We=92re just confused as to why Hive considers the final bit of the output = path (e.g. 000047_0) to be a =93dynamic partition=94, as this is not in our= PARTITIONED BY clause The only thing I can think of is the CLUSTERED BY 128 BUCKETS clause, combi= ned with the dataset being really small (180 lines), is loading everything = into 1 REDUCER task =96 but the hashing of each line is distributing the ro= ws fairly uniformly so we have > 100 buckets to write to via one reducer Any help will be greatly appreciated With thanks, Daniel Harper Software Engineer, OTG ANT BC5 A5 --_000_D154454F41D2DanielHarperbbccouk_ Content-Type: text/html; charset="Windows-1252" Content-ID: <6BD5F9444B6AEC40AC7A04B66FE11261@bbc.co.uk> Content-Transfer-Encoding: quoted-printable
Hi there,

We've been encountering th= e exception

Error: java.lang.RuntimeException: org.apache.hadoo= p.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred = when node tried to create too many dynamic partitions. The maximum number o= f dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partiti= ons.pernode. Maximum was set to: 100

On a very small dataset (1= 80 lines) using the following setup

CREATE TABLE enriched_data (
enriched_json_data string
)
PARTITIONED BY (yyyy string, mm string, dd str= ing, identifier string, sub_identifier string, unique_run_id string)=
CLUSTERED BY (enriched_json_data) INTO 128 BUC= KETS
LOCATION "${OUTDIR}"; 

INSERT OVERWRITE TABLE enriched_data PARTITION= (yyyy, mm, dd, identifier, sub_identifier, unique_run_id)
SELECT =85

We=92ve not seen this issu= e before (normally our dataset is billions of lines), but in this case we h= ave a very tiny amount of data causing this issue.

I downloaded and reb= uilt the branch with a bit of debugging/stdout printing on the contents of = the valToPaths map and it fails as there are 101 entries in it

All the entries look like = this

yyyy=3D2015/mm=3D04/dd=3D0= 9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000047_0
yyyy=3D2015/mm=3D04/dd=3D0= 9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000048_0
yyyy=3D2015/mm=3D04/dd=3D0= 9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000049_0
yyyy=3D2015/mm=3D04/dd=3D0= 9/identifier=3D1/sub-identifier=3D3/unique_run_id=3Ddf-345345/000051_0
=85.

We=92re just confused as to why Hive= considers the final bit of the output path (e.g. 000047_0) to be a =93dyna= mic partition=94, as this is not in our PARTITIONED BY clause

The only thing I can think= of is the CLUSTERED BY 128 BUCKETS clause, combined with the dataset being= really small (180 lines), is loading everything into 1 REDUCER task =96 bu= t the hashing of each line is distributing the rows fairly uniformly so we have > 100 buckets to write to via one = reducer

Any help will be greatly a= ppreciated

With thanks,

Daniel Harper
Software= Engineer, OTG ANT
BC5 A5
--_000_D154454F41D2DanielHarperbbccouk_--