crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Rose (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-602) Combiner initialization repeatedly retrieves RT nodes from DistCache, leading to high NN load
Date Mon, 18 Apr 2016 19:25:25 GMT
Michael Rose created CRUNCH-602:
-----------------------------------

             Summary: Combiner initialization repeatedly retrieves RT nodes from DistCache,
leading to high NN load
                 Key: CRUNCH-602
                 URL: https://issues.apache.org/jira/browse/CRUNCH-602
             Project: Crunch
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.13.0, 0.12.0, 0.14.0
         Environment: Crunch 0.14-SNAPSHOT, CDH5.6.0
            Reporter: Michael Rose
            Assignee: Josh Wills


When running one of our Crunch pipelines, we noticed our NameNode under very heavy load. We
run our masters on pretty light hardware, so our NN was sitting at 100% CPU.

Crunch reads the RTNodes during creation of a CrunchTaskContext. These are created when Mappers
and Reducers are created. Importantly, a CrunchCombiner is a subclass of a Reducer, so each
mapper will create R combiners where R is the number of reducers and thus R CrunchTaskContexts.
Consequently in highly parallel jobs, this means M*R semi-expensive calls to the NameNode.

In the constructor for CrunchTaskContext, this is the read to the DistCache:

this.nodes = (List<RTNode>) DistCache.read(conf, path);

Which then leads to a read into the NN + deserialization.

For now, we took the overly simplistic approach of caching the results of the DistCache read
in a Guava cache. The cache ensures combiners reuse RTNodes with only the overhead of deserialization
which is somewhat unavoidable as RTNodes are stateful and not reusable. However, it's not
configurable except by modifying code.

I'll attach the patch, but given that it's not yet configurable I wouldn't call it a "fix
available." There may be much better ways of fixing this issue as well -- if you have some
guidance I'd be happy to do the legwork on a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message