Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Sat, 4 Oct 2014 05:19:34 +0000 (UTC)
From: "Gopal V (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12744750.1412016755000.189741.1412399974655@Atlassian.JIRA>
In-Reply-To: <JIRA.12744750.1412016755000@Atlassian.JIRA>
References: <JIRA.12744750.1412016755000@Atlassian.JIRA>
 <JIRA.12744750.1412016755268@arcas>
Subject: [jira] [Commented] (HIVE-8292) Reading from partitioned bucketed
 tables has high overhead in MapOperator.cleanUpInputFileChangedOp
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158966#comment-14158966 ] 

Gopal V commented on HIVE-8292:
-------------------------------

Traced it down to

{code}
  @Override
  public boolean pushRecord() throws HiveException {
    execContext.resetRow(); <-- resets input checks
{code}

This is why the cleanup input is being called once per row.

> Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8292
>                 URL: https://issues.apache.org/jira/browse/HIVE-8292
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>         Environment: cn105
>            Reporter: Mostafa Mokhtar
>            Assignee: Prasanth J
>             Fix For: 0.14.0
>
>         Attachments: 2014_09_29_14_46_04.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files.
> 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 45% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace	Sample Count	Percentage(%)
> hive.ql.exec.tez.MapRecordSource.processRow(Object)	5,327	62.348
>    hive.ql.exec.vector.VectorMapOperator.process(Writable)	5,326	62.336
>       hive.ql.exec.Operator.cleanUpInputFileChanged()	4,851	56.777
>          hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()	4,849	56.753
>                                  java.net.URI.relativize(URI)	3,903	45.681
>                                     java.net.URI.relativize(URI, URI)	3,903	45.681
>                                        java.net.URI.normalize(String)	2,169	25.386
>                                        java.net.URI.equal(String, String)	526	6.156
>                                        java.net.URI.equalIgnoringCase(String, String)	1	0.012
>                                        java.lang.String.substring(int)	1	0.012
>             hive.ql.exec.MapOperator.normalizePath(String)	506	5.922
>             org.apache.commons.logging.impl.Log4JLogger.info(Object)	32	0.375
>                                  java.net.URI.equals(Object)	12	0.14
>                                  java.util.HashMap$KeySet.iterator()	5	0.059
>                                  java.util.HashMap.get(Object)	4	0.047
>                                  java.util.LinkedHashMap.get(Object)	3	0.035
>          hive.ql.exec.Operator.cleanUpInputFileChanged()	1	0.012
>       hive.ql.exec.Operator.forward(Object, ObjectInspector)	473	5.536
>       hive.ql.exec.mr.ExecMapperContext.inputFileChanged()	1	0.012
> {code}


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)