crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Rose (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-602) Combiner initialization repeatedly retrieves RT nodes from DistCache, leading to high NN load
Date Fri, 14 Oct 2016 16:20:20 GMT


Michael Rose commented on CRUNCH-602:

[~mkwhitacre] Sorry for never replying. We don't use the DistCache at all, which makes our
solution work for us. There's certainly a better patch out there that doesn't affect the DistCache
semantics for other usecases, but we haven't pursued it as of yet. I do think this is a significant
issue to fix still.

Thinking about it a bit more, adding a layer of indirection might be the way to go, vs. calling
out directly to the DistCache#read. e.g. a DistCache#readRtNodes which implements caching
semantics ala [~dmi]'s or my patch, while leaving DistCache#read unaffected.

> Combiner initialization repeatedly retrieves RT nodes from DistCache, leading to high
NN load
> ---------------------------------------------------------------------------------------------
>                 Key: CRUNCH-602
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0, 0.13.0, 0.14.0
>         Environment: Crunch 0.14-SNAPSHOT, CDH5.6.0
>            Reporter: Michael Rose
>            Assignee: Josh Wills
>              Labels: performance
>         Attachments: crunch-602.patch, quickfix_crunch_distcache.patch
> When running one of our Crunch pipelines, we noticed our NameNode under very heavy load.
We run our masters on pretty light hardware, so our NN was sitting at 100% CPU.
> Crunch reads the RTNodes during creation of a CrunchTaskContext. These are created when
Mappers and Reducers are created. Importantly, a CrunchCombiner is a subclass of a Reducer,
so each mapper will create R combiners where R is the number of reducers and thus R CrunchTaskContexts.
Consequently in highly parallel jobs, this means M*R semi-expensive calls to the NameNode.
> In the constructor for CrunchTaskContext, this is the read to the DistCache:
> this.nodes = (List<RTNode>), path);
> Which then leads to a read into the NN + deserialization.
> For now, we took the overly simplistic approach of caching the results of the DistCache
read in a Guava cache. The cache ensures combiners reuse RTNodes with only the overhead of
deserialization which is somewhat unavoidable as RTNodes are stateful and not reusable. However,
it's not configurable except by modifying code.
> I'll attach the patch, but given that it's not yet configurable I wouldn't call it a
"fix available." There may be much better ways of fixing this issue as well -- if you have
some guidance I'd be happy to do the legwork on a patch.

This message was sent by Atlassian JIRA

View raw message