carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CARBONDATA-1335) Duplicated & time-consuming method call found in query
Date Thu, 27 Jul 2017 13:37:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacky Li resolved CARBONDATA-1335.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0

> Duplicated & time-consuming method call found in query
> ------------------------------------------------------
>
>                 Key: CARBONDATA-1335
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1335
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-query
>    Affects Versions: 1.1.1
>            Reporter: xuchuanyin
>            Priority: Minor
>              Labels: performance
>             Fix For: 1.2.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> # Scenario
> Currently we did a concurrent  14 queries on Carbondata. The queries are the same, but
on different tables. We have noticed the following scene:
> + A single query took about 5s;
> + In concurrent scenario, each query took about 15s;
> By adding checkpoint in the log, we found that there was great latency in starting query
jobs in spark.
> # Analyze
> When we fire a query, Carbondata firstly do some job in the client side, including parse/analyze
plans and prepare filtered blocks and inputSplits. Then Carbondata start to submit query job
to spark. 
> We found in the first step, Carbondata took about 7s in current scenario, but it only
took about <1s in single scenario.
> By studying the related code, we found the most time consuming method call was  `CarbonSessionCatalog.lookupRelation`.
In side this method, it called `super.lookupRelation` twice, which consumed about 3s each
time.
> # Solution
> Carbondata only needs to call the `super.lookupRelation` only once, we need to remove
the useless duplicated method call.
> I've tested in my environment and it works well. In concurrent scenario, each query takes
about 12s (3s saved for the improvement).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message