hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
Date Wed, 02 Mar 2016 02:30:18 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-11675:
------------------------------------
    Attachment: HIVE-11675.09.patch

The test only failed because I forgot to enable the ORC PPD setting in it (the check was added
in the 08 patch to make sure ORC metastore PPD only runs when ORC ppd is enabled). Spark timeouts
are a known issue elsewhere. [~prasanth_j] ping?

> make use of file footer PPD API in ETL strategy or separate strategy
> --------------------------------------------------------------------
>
>                 Key: HIVE-11675
>                 URL: https://issues.apache.org/jira/browse/HIVE-11675
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, HIVE-11675.03.patch, HIVE-11675.04.patch,
HIVE-11675.05.patch, HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, HIVE-11675.09.patch,
HIVE-11675.patch
>
>
> Need to take a look at the best flow. It won't be much different if we do filtering metastore
call for each partition. So perhaps we'd need the custom sync point/batching after all.
> Or we can make it opportunistic and not fetch any footers unless it can be pushed down
to metastore or fetched from local cache, that way the only slow threaded op is directory
listings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message