pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Watson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (PIG-5071) MapReduce concurrency Could Be Better
Date Thu, 08 Dec 2016 14:30:58 GMT

     [ https://issues.apache.org/jira/browse/PIG-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

William Watson resolved PIG-5071.
    Resolution: Won't Fix

Marking as "won't fix" since the solution would be quite difficult or maybe impossible and
the Tez execution engine provides a sufficient work around.

> MapReduce concurrency Could Be Better
> -------------------------------------
>                 Key: PIG-5071
>                 URL: https://issues.apache.org/jira/browse/PIG-5071
>             Project: Pig
>          Issue Type: Wish
>            Reporter: William Watson
> We have a job that launches, after optimization, about 20 MapReduce jobs. Some of these
are quite long running and while pig does an okay job of running jobs concurrently, it could
do better at least in this very specific case.
> The pig job can be divided up amongst 4 major sections like so:
> A1 -> A2 -> A3 -> A4 -> A
> B1 -> B2 -> B
> C1 -> C2 -> C3 -> C
> D1 -> D2 -> D3 -> D4 -> D
> and the sections are joined at the end:
> A + B -> AB
> AB + C -> ABC
> ABC + D -> ABCD
> In short, if C2 finishes very quickly, C3 won't be started until A2, B2, and D2 are all
also complete. This is a problem if say, D2 takes an hour and there are unused cluster resources
that could be made available to C3 (and by extension A3 and B3 if their prerequisites also
finish before D2).
> One possible work around is to scale D2 better, but that's besides the point. I think
pig is capable of knowing that the prerequisites are done for certain jobs, but since it only
kicks off jobs in "phases", it won't kick off jobs as soon as possible.
> I've taken a look at the code and I'm having a hard time working out where the issue
is or else I would be glad to contribute a patch. 
> Is this a desirable feature and is this directly controlled by pig? If so, could someone
help point me in the right direction so I can contribute a patch?
> Note: We can change this from a "wish" to an "improvement" if this feature is desired...

This message was sent by Atlassian JIRA

View raw message