hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abdullah Yousufi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-14165) Enable faster S3 Split Computation
Date Wed, 27 Jul 2016 18:29:20 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Abdullah Yousufi updated HIVE-14165:
------------------------------------
    Description: Split size computation be may improved by the optimizations for listFiles()
in HADOOP-13208  (was: During split computation when a large number of files are required
to be listed from S3, instead of executing 1 API call per file, one can optimize by listing
1000 files in each API call. This would reduce the amount of time required for listing files.

Qubole has this optimization in place as detailed here: https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/?nabe=5695374637924352:0)

> Enable faster S3 Split Computation
> ----------------------------------
>
>                 Key: HIVE-14165
>                 URL: https://issues.apache.org/jira/browse/HIVE-14165
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0
>            Reporter: Abdullah Yousufi
>            Assignee: Abdullah Yousufi
>
> Split size computation be may improved by the optimizations for listFiles() in HADOOP-13208



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message