hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-15422) HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of objects for partitioned dataset
Date Thu, 15 Dec 2016 00:15:59 GMT

     [ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rajesh Balamohan updated HIVE-15422:
------------------------------------
       Resolution: Fixed
     Hadoop Flags: Reviewed
    Fix Version/s: 2.2.0
           Status: Resolved  (was: Patch Available)

Committed to master. Thanks for the review [~sershe].

> HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge number of
objects for partitioned dataset
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15422
>                 URL: https://issues.apache.org/jira/browse/HIVE-15422
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png
>
>
> When executing the following query in LLAP (single instance) in a 5 node cluster, lots
of GC pressure was observed.
> {noformat}
> select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon
> from (select  'depart' as type, origin as city, count(origin) as frequency
> from flights
>   group by origin
>   order by frequency desc, type) as a 
> left join airports as b on a.city = b.iata
> order by frequency desc;
> {noformat}
> Flights table has got around 7000+ partitions in S3. Profiling revealed large amount
of objects created just in path comparisons in HiveInputFormat.  HIVE-15405 reduces number
of path comparisons at FileUtils, but it still ends up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message