hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive
Date Wed, 14 Jan 2015 05:59:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276527#comment-14276527
] 

Rui Li commented on HIVE-9367:
------------------------------

I just verified the patch here can reduce the getSplits time from 1s to less than 200ms. The
test table consists of one 100GB sequence file.

> CombineFileInputFormatShim#getDirIndices is expensive
> -----------------------------------------------------
>
>                 Key: HIVE-9367
>                 URL: https://issues.apache.org/jira/browse/HIVE-9367
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>         Attachments: HIVE-9367.1.patch
>
>
> [~lirui] found out that we spent quite some time on CombineFileInputFormatShim#getDirIndices.
Looked into it and it seems to me we should be able to get rid of this method completely if
we can enhance CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message