Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Mon, 29 Sep 2014 18:57:34 +0000 (UTC)
From: "Mostafa Mokhtar (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12744728.1412013157000.147556.1412017054217@Atlassian.JIRA>
In-Reply-To: <JIRA.12744728.1412013157000@Atlassian.JIRA>
References: <JIRA.12744728.1412013157000@Atlassian.JIRA>
 <JIRA.12744728.1412013157939@arcas>
Subject: [jira] [Updated] (HIVE-8291) ACID : Reading from partitioned
 bucketed tables has high overhead, 50% of time is spent in
 OrcInputFormat.getReader
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Mokhtar updated HIVE-8291:
----------------------------------
    Attachment: 2014_09_28_16_48_48.jfr

Hot function profile.
Use Java mission control (jmc) to open the file, JMC is part of Java 7.

> ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8291
>                 URL: https://issues.apache.org/jira/browse/HIVE-8291
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>         Environment: cn105
>            Reporter: Mostafa Mokhtar
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: 2014_09_28_16_48_48.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in OrcInputFormate.getReader()
> {code}
>     String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
>                                 Long.MAX_VALUE + ":");
>     ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace	Sample Count	Percentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord()	2,981	87.215
> 	org.apache.tez.mapreduce.lib.MRReaderMapred.next()	2,002	58.572
>     	mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object)	2,002	58.572
> 			mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()	1,984	58.046
>             	hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter)	1,983	58.016
>                 	hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)	1,891	55.325
>                     	hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options)	1,723	50.41
>                         	hive.common.ValidTxnListImpl.<init>(String)	934	27.326
>                             conf.Configuration.get(String, String)	621	18.169
>  {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)