Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 81B8E200B71 for ; Wed, 17 Aug 2016 03:05:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 80680160AA8; Wed, 17 Aug 2016 01:05:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CA155160ABA for ; Wed, 17 Aug 2016 03:05:21 +0200 (CEST) Received: (qmail 35336 invoked by uid 500); 17 Aug 2016 01:05:21 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 35320 invoked by uid 500); 17 Aug 2016 01:05:21 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 35313 invoked by uid 99); 17 Aug 2016 01:05:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Aug 2016 01:05:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CD1012C02A7 for ; Wed, 17 Aug 2016 01:05:20 +0000 (UTC) Date: Wed, 17 Aug 2016 01:05:20 +0000 (UTC) From: "Micah Whitacre (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-602) Combiner initialization repeatedly retrieves RT nodes from DistCache, leading to high NN load MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 17 Aug 2016 01:05:22 -0000 [ https://issues.apache.org/jira/browse/CRUNCH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423696#comment-15423696 ] Micah Whitacre commented on CRUNCH-602: --------------------------------------- [~xorlev], were you able to isolate this down to confirm the issue? > Combiner initialization repeatedly retrieves RT nodes from DistCache, leading to high NN load > --------------------------------------------------------------------------------------------- > > Key: CRUNCH-602 > URL: https://issues.apache.org/jira/browse/CRUNCH-602 > Project: Crunch > Issue Type: Improvement > Components: Core > Affects Versions: 0.12.0, 0.13.0, 0.14.0 > Environment: Crunch 0.14-SNAPSHOT, CDH5.6.0 > Reporter: Michael Rose > Assignee: Josh Wills > Labels: performance > Attachments: crunch-602.patch > > > When running one of our Crunch pipelines, we noticed our NameNode under very heavy load. We run our masters on pretty light hardware, so our NN was sitting at 100% CPU. > Crunch reads the RTNodes during creation of a CrunchTaskContext. These are created when Mappers and Reducers are created. Importantly, a CrunchCombiner is a subclass of a Reducer, so each mapper will create R combiners where R is the number of reducers and thus R CrunchTaskContexts. Consequently in highly parallel jobs, this means M*R semi-expensive calls to the NameNode. > In the constructor for CrunchTaskContext, this is the read to the DistCache: > this.nodes = (List) DistCache.read(conf, path); > Which then leads to a read into the NN + deserialization. > For now, we took the overly simplistic approach of caching the results of the DistCache read in a Guava cache. The cache ensures combiners reuse RTNodes with only the overhead of deserialization which is somewhat unavoidable as RTNodes are stateful and not reusable. However, it's not configurable except by modifying code. > I'll attach the patch, but given that it's not yet configurable I wouldn't call it a "fix available." There may be much better ways of fixing this issue as well -- if you have some guidance I'd be happy to do the legwork on a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)