crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <>
Subject [jira] [Created] (CRUNCH-352) Share library jars between MR stages
Date Sat, 22 Feb 2014 12:13:19 GMT
Chao Shi created CRUNCH-352:

             Summary: Share library jars between MR stages
                 Key: CRUNCH-352
             Project: Crunch
          Issue Type: Improvement
            Reporter: Chao Shi

Currently, library jars are copied to the staging directory every time when a MR job submitted.
This is time-consuming when a pipeline consumes tens of stages. To make it even worse, the
job client may run in a network away from cluster.

I found hive and pig have/will have this optimization (HIVE-860 and PIG-2672). Yarn also has
similar plan (YARN-1492).

Although this is better done at Yarn/MR level, we can still do it at client side solution
to benefit users who cannot upgrade to latest Yarn or have to use legacy MRv1.

This message was sent by Atlassian JIRA

View raw message