Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Tue, 15 Dec 2015 22:27:46 +0000 (UTC)
From: "Prasanth Jayachandran (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.12922159.1450217627000.32050.1450218466697@Atlassian.JIRA>
In-Reply-To: <JIRA.12922159.1450217627000@Atlassian.JIRA>
References: <JIRA.12922159.1450217627000@Atlassian.JIRA>
 <JIRA.12922159.1450217627930@arcas>
Subject: [jira] [Commented] (HIVE-12682) Reducers in dynamic partitioning
 job spend a lot of time running hadoop.conf.Configuration.getOverlay
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HIVE-12682?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1505=
8985#comment-15058985 ]=20

Prasanth Jayachandran commented on HIVE-12682:
----------------------------------------------

I don't think we need the task id for sorted dynamic partition optimization=
. Since sorted dynamic partition already has the bucket number in the key..=
 We can just pass "000000_0" string to the replace function along with buck=
et number.=20

> Reducers in dynamic partitioning job spend a lot of time running hadoop.c=
onf.Configuration.getOverlay
> -------------------------------------------------------------------------=
----------------------------
>
>                 Key: HIVE-12682
>                 URL: https://issues.apache.org/jira/browse/HIVE-12682
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Carter Shanklin
>            Assignee: Gopal V
>         Attachments: reducer.png
>
>
> I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.
> I ran this query:
> {code}
> create table flights (
> =E2=80=A6
> )
> PARTITIONED BY (Year int)
> CLUSTERED BY (Month)
> SORTED BY (DayofMonth) into 12 buckets
> STORED AS ORC
> TBLPROPERTIES("orc.bloom.filter.columns"=3D"*")
> ;
> {code}
> (Taken from here: https://github.com/t3rmin4t0r/all-airlines-data/blob/ma=
ster/ddl/orc.sql)
> I profiled just the reduce phase and noticed something odd, the attached =
graph shows where time was spent during the reducer phase.
> !reducer.png!
> Problem seems to relate to https://github.com/apache/hive/blob/branch-2.0=
/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903
> /cc [~gopalv]


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)