hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-23158) Optimize S3A recordReader policy for Random IO formats
Date Thu, 09 Apr 2020 16:25:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-23158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079529#comment-17079529
] 

Ashutosh Chauhan commented on HIVE-23158:
-----------------------------------------

+1

> Optimize S3A recordReader policy for Random IO formats
> ------------------------------------------------------
>
>                 Key: HIVE-23158
>                 URL: https://issues.apache.org/jira/browse/HIVE-23158
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Trivial
>              Labels: pull-request-available
>         Attachments: HIVE-23158.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> S3A filesystem client (inherited by Hadoop) supports the notion of input policies.
>  These policies tune the behaviour of HTTP requests that are used for reading different
filetypes such as TEXT or ORC.
> For formats such as ORC and Parquet that do a lot of seek operations, there is an optimized
RANDOM mode that reads files only partially instead of fully (default).
> I am suggesting to add some extra logic as part of HiveInputFormat to make sure we optimize
RecordReader requests for random IO when data is stored on S3A using formats such as ORC or
Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message