hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <>
Subject [jira] [Updated] (HIVE-17095) Long chain repl loads do not complete in a timely fashion
Date Wed, 19 Jul 2017 19:10:00 GMT


Sushanth Sowmyan updated HIVE-17095:
       Resolution: Fixed
    Fix Version/s: 3.0.0
           Status: Resolved  (was: Patch Available)

Thanks for the review, Daniel. Pushed to master.

> Long chain repl loads do not complete in a timely fashion
> ---------------------------------------------------------
>                 Key: HIVE-17095
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning, repl
>            Reporter: sapin amin
>            Assignee: Sushanth Sowmyan
>             Fix For: 3.0.0
>         Attachments: HIVE-17095.2.patch, HIVE-17095.patch
> Per performance testing done by [~sapinamin] (thus, I'm setting him as reporter), we
were able to discover an important bug affecting replication. It has the potential to affect
other large DAGs of Tasks that hive generates as well, if those DAGs have multiple paths to
child Task nodes.
> Basically, we find that incremental REPL LOAD does not finish in a timely fashion. The
test, in this case was to add 400 partitions, and replicate them. Associated with each partition,
there was an ADD PTN and a ALTER PTN. For each of the ADD PTN tasks, we'd generate a DDLTask,
a CopyTask and a MoveTask. For each Alter ptn, there'd be a single DDLTask. And order of execution
is important, so it would chain in dependency collection tasks between phases.
> Trying to root cause this shows us that it seems to stall forever at the Driver instantiation
time, and it almost looks like the thread doesn't proceed past that point.
> Looking at logs, it seems that the way this is written, it looks for all tasks generated
that are subtrees of all nodes, without looking for duplicates, and this is done simply to
get the number of execution tasks!
> And thus, the task visitor will visit every subtree of every node, which is fine if you
have graphs that look like open trees, but is horrible for us, since we have dependency collection
tasks between each phase. Effectively, this is what's happening:
> We have a DAG, say, like this:
> 4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ...
> This means that for each of the 4 root tasks, we will do a full traversal of every graph
(not just every node) past the DEP col, and this happens recursively, and this leads to an
exponential growth of number of tasks visited as the length and breadth of the graph increase.
In our case, we had about 800 tasks in the graph, with roughly a width of about 2-3, with
200 stages, a dep collection before and after, and this meant that leaf nodes of this DAG
would have something like 2^200 - 3^200 ways in which they can be visited, and thus, we'd
visit them in all those ways. And all this simply to count the number of tasks to schedule
- we would revisit this function multiple more times, once per each hook, once for the MapReduceCompiler
and once for the TaskCompiler.
> We have not been sending such large DAGs to the Driver, thus it has not yet been a problem,
and there are upcoming changes to reduce the number of tasks replication generates(as part
of a memory addressing issue), but we still should fix the way we do Task traversal so that
a large DAG cannot cripple us.

This message was sent by Atlassian JIRA

View raw message