hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (Jira)" <>
Subject [jira] [Commented] (HIVE-22411) Performance degradation on single row inserts
Date Thu, 31 Oct 2019 16:06:00 GMT


Steve Loughran commented on HIVE-22411:

patch looks functional to me at a glance

There is still a cost to all these list operations. Is there actually a way to avoid them
-such as have whatever commits the output passing up the details on what has changed?

> Performance degradation on single row inserts
> ---------------------------------------------
>                 Key: HIVE-22411
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>             Fix For: 4.0.0
>         Attachments: HIVE-22411.1.patch, Screen Shot 2019-10-17 at 8.40.50 PM.png
> Executing single insert statements on a transactional table effects write performance
on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates
statistics like number of file in the table and total size of the table. In order to calculate
these, it traverses the directory recursively. During the recursion for each path a separateĀ listStatus
call is executed. In the end the more delta directory you have the more time it takes to calculate
the statistics.
> Therefore insertion time goes up linearly:
> !Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436!
> The fix is to useĀ fs.listFiles(path, /**recursive**/ true) instead the handcrafter recursive

This message was sent by Atlassian Jira

View raw message