hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
Date Fri, 04 Aug 2017 06:59:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

anishek updated HIVE-16896:
---------------------------
    Attachment:     (was: HIVE-16896.3.patch)

> move replication load related work in semantic analysis phase to execution phase using
a task
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16896
>                 URL: https://issues.apache.org/jira/browse/HIVE-16896
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: anishek
>            Assignee: anishek
>         Attachments: HIVE-16896.1.patch, HIVE-16896.2.patch, HIVE-16896.3.patch
>
>
> we want to not create too many tasks in memory in the analysis phase while loading data.
Currently we load all the files in the bootstrap dump location as {{FileStatus[]}} and then
iterate over it to load objects, we should rather move to 
> {code}
> org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>	listFiles(Path f, boolean
recursive)
> {code}
> which would internally batch and return values. 
> additionally since we cant hand off partial tasks from analysis pahse => execution
phase, we are going to move the whole repl load functionality to execution phase so we can
better control creation/execution of tasks (not related to hive {{Task}}, we may get rid of
ReplCopyTask)
> Additional consideration to take into account at the end of this jira is to see if we
want to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message