Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52E31119BC for ; Fri, 29 Aug 2014 00:09:05 +0000 (UTC) Received: (qmail 19135 invoked by uid 500); 29 Aug 2014 00:09:05 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 19101 invoked by uid 500); 29 Aug 2014 00:09:05 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 19090 invoked by uid 99); 29 Aug 2014 00:09:05 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Aug 2014 00:09:05 +0000 Received: from localhost (HELO mail-ig0-f173.google.com) (127.0.0.1) (smtp-auth username edwardyoon, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Aug 2014 00:09:03 +0000 Received: by mail-ig0-f173.google.com with SMTP id h18so8892205igc.0 for ; Thu, 28 Aug 2014 17:09:03 -0700 (PDT) X-Gm-Message-State: ALoCoQlzd1U5OOT/bxHASW0Bxzx0jFJ2wgcT9oh57VpfmgGZejSJTm+z8w+9FilPdbiXzkcryZex MIME-Version: 1.0 X-Received: by 10.43.6.200 with SMTP id ol8mr7930392icb.39.1409270943197; Thu, 28 Aug 2014 17:09:03 -0700 (PDT) Received: by 10.64.148.79 with HTTP; Thu, 28 Aug 2014 17:09:03 -0700 (PDT) Date: Fri, 29 Aug 2014 09:09:03 +0900 Message-ID: Subject: [DISCUSS/VOTE] Refactor of message queue . From: "Edward J. Yoon" To: "dev@hama.apache.org" Content-Type: text/plain; charset=UTF-8 First of all, Our main problem is that current system requires a lot of memory space, especially graph module. As you already might know, the main memory consumer is the message queue. To solve this problem, we considered the use of local disk space e.g., DiskQueue and SpillingQueue. However, those queues are basically not able to bundle and group the messages by destination server, in memory-efficient way. So, I don't think this approach is right way. My solution for saving the memory usage and the performance degradation, is storing serializable message objects as a byte array in queue. In graph case, 3X ~ 6X memory efficiency is expected than before (GraphJobMessage consists of destination vertex ID and message value multi-objects). In 0.6.4, Outgoing queue is replaced with outgoing bundles manager, and it showed nice memory improvement. Now I wanna start refactoring of incoming queue. My plan is that adding incoming bundles manager. Bundles can also simply be written to local disk if when memory space is not enough. So, incoming bundles manager can be performed a similar role of DiskQueue and SpillingQueue in the future. If you have any other opinion, Please let me know. If there are no objections, I'll do it. -- Best Regards, Edward J. Yoon CEO at DataSayer Co., Ltd.