hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anishek (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-16896) move replication load related work in semantic analysis phase to execution phase using a task
Date Wed, 14 Jun 2017 09:03:00 GMT
anishek created HIVE-16896:
------------------------------

             Summary: move replication load related work in semantic analysis phase to execution
phase using a task
                 Key: HIVE-16896
                 URL: https://issues.apache.org/jira/browse/HIVE-16896
             Project: Hive
          Issue Type: Improvement
            Reporter: anishek
            Assignee: anishek


we want to not create too many tasks in memory in the analysis phase while loading data. Currently
we load all the files in the bootstrap dump location as {{FileStatus[]}} and then iterate
over it to load objects, we should rather move to 
{code}
org.apache.hadoop.fs.RemoteIterator<LocatedFileStatus>	listFiles(Path f, boolean recursive)
{code}

which would internally batch and return values. 

additionally since we cant hand off partial tasks from analysis pahse => execution phase,
we are going to move the whole repl load functionality to execution phase so we can better
control creation/execution of tasks (not related to hive {{Task}}, we may get rid of ReplCopyTask)

Additional consideration to take into account at the end of this jira is to see if we want
to specifically do a multi threaded load of bootstrap dump.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message