giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-1125) Add memory estimation mechanism to out-of-core
Date Wed, 14 Dec 2016 20:57:58 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749458#comment-15749458
] 

ASF GitHub Bot commented on GIRAPH-1125:
----------------------------------------

Github user edunov commented on a diff in the pull request:

    https://github.com/apache/giraph/pull/12#discussion_r92487131
  
    --- Diff: giraph-core/src/main/java/org/apache/giraph/utils/ThreadLocalProgressCounter.java
---
    @@ -0,0 +1,67 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.giraph.utils;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +/**
    + * Makes a list of {@link ProgressCounter} accessible through
    + * a {@link ThreadLocal}.
    + */
    +public class ThreadLocalProgressCounter extends ThreadLocal<ProgressCounter> {
    +  /**
    +   * List of counters.
    +   */
    +  private final List<ProgressCounter> counters = new ArrayList<>();
    +
    +  /**
    +   * Initializes a new counter, adds it to the list of counters
    +   * and returns it.
    +   * @return Progress counter.
    +   */
    +  @Override
    +  protected ProgressCounter initialValue() {
    +    ProgressCounter threadCounter = new ProgressCounter();
    +    synchronized (counters) {
    +      counters.add(threadCounter);
    +    }
    +    return threadCounter;
    +  }
    +
    +  /**
    +   * Sums the progress of all counters.
    +   * @return Sum of all counters
    +   */
    +  public long getProgress() {
    +    long progress = 0;
    +    synchronized (counters) {
    +      for (ProgressCounter entry : counters) {
    +        progress += entry.getValue();
    +      }
    +    }
    +    return progress;
    +  }
    +
    +  /**
    +   * Removes all counters.
    +   */
    +  public void reset() {
    --- End diff --
    
    What is the purpose of this function and how do you use it? 


> Add memory estimation mechanism to out-of-core
> ----------------------------------------------
>
>                 Key: GIRAPH-1125
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-1125
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Hassan Eslami
>            Assignee: Hassan Eslami
>
> The new out-of-core mechanism is designed with the adaptivity goal in mind, meaning that
we wanted out-of-core mechanism to kick in only when it is necessary. In other words, when
the amount of data (graph, messages, and mutations) all fit in memory, we want to take advantage
of the entire memory. And, when in a stage the memory is short, only enough (minimal) amount
of data goes out of core (to disk). This ensures a good performance for the out-of-core mechanism.
> To satisfy the adaptiveness goal, we need to know how much memory is used at each point
of time. The default out-of-core mechanism (ThresholdBasedOracle) get memory information based
on JVM's internal methods (Runtime's freeMemory()). This method is inaccurate (and pessimistic),
meaning that it does not account for garbage data that has not been purged by GC. Using JVM's
default methods, OOC behaves pessimistically and move data out of core even if it is not necessary.
For instance, consider the case where there are a lot of garbage on the heap, but GC has not
happened for a while. In this case, the default OOC pushes data on disk and immediately after
a major GC it brings back the data to memory. This causes inefficiency in the default out
of core mechanism. If out-of-core is used but the data can entirely fit in memory, the job
goes out of core even though going out of core is not necessary.
> To address this issue, we need to have a mechanism to more accurately know how much of
heap is filled with non-garbage data. Consequently, we need to change the Oracle (OOC policy)
to take advantage of a more accurate memory usage estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message