Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8A040200BE7 for ; Tue, 6 Dec 2016 00:26:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 88B76160B18; Mon, 5 Dec 2016 23:26:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D144C160B21 for ; Tue, 6 Dec 2016 00:25:59 +0100 (CET) Received: (qmail 55309 invoked by uid 500); 5 Dec 2016 23:25:58 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 55282 invoked by uid 500); 5 Dec 2016 23:25:58 -0000 Delivered-To: apmail-incubator-giraph-dev@incubator.apache.org Received: (qmail 55272 invoked by uid 99); 5 Dec 2016 23:25:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Dec 2016 23:25:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 858F32C0086 for ; Mon, 5 Dec 2016 23:25:58 +0000 (UTC) Date: Mon, 5 Dec 2016 23:25:58 +0000 (UTC) From: "Hassan Eslami (JIRA)" To: giraph-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (GIRAPH-1125) Add memory estimation mechanism to out-of-core MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 05 Dec 2016 23:26:00 -0000 Hassan Eslami created GIRAPH-1125: ------------------------------------- Summary: Add memory estimation mechanism to out-of-core Key: GIRAPH-1125 URL: https://issues.apache.org/jira/browse/GIRAPH-1125 Project: Giraph Issue Type: Improvement Reporter: Hassan Eslami Assignee: Hassan Eslami The new out-of-core mechanism is designed with the adaptivity goal in mind, meaning that we wanted out-of-core mechanism to kick in only when it is necessary. In other words, when the amount of data (graph, messages, and mutations) all fit in memory, we want to take advantage of the entire memory. And, when in a stage the memory is short, only enough (minimal) amount of data goes out of core (to disk). This ensures a good performance for the out-of-core mechanism. To satisfy the adaptiveness goal, we need to know how much memory is used at each point of time. The default out-of-core mechanism (ThresholdBasedOracle) get memory information based on JVM's internal methods (Runtime's freeMemory()). This method is inaccurate (and pessimistic), meaning that it does not account for garbage data that has not been purged by GC. Using JVM's default methods, OOC behaves pessimistically and move data out of core even if it is not necessary. For instance, consider the case where there are a lot of garbage on the heap, but GC has not happened for a while. In this case, the default OOC pushes data on disk and immediately after a major GC it brings back the data to memory. This causes inefficiency in the default out of core mechanism. If out-of-core is used but the data can entirely fit in memory, the job goes out of core even though going out of core is not necessary. To address this issue, we need to have a mechanism to more accurately know how much of heap is filled with non-garbage data. Consequently, we need to change the Oracle (OOC policy) to take advantage of a more accurate memory usage estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)