Return-Path: X-Original-To: apmail-hama-dev-archive@www.apache.org Delivered-To: apmail-hama-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97FA6D75E for ; Tue, 26 Feb 2013 15:16:16 +0000 (UTC) Received: (qmail 74707 invoked by uid 500); 26 Feb 2013 15:16:16 -0000 Delivered-To: apmail-hama-dev-archive@hama.apache.org Received: (qmail 73469 invoked by uid 500); 26 Feb 2013 15:16:14 -0000 Mailing-List: contact dev-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hama.apache.org Delivered-To: mailing list dev@hama.apache.org Received: (qmail 72711 invoked by uid 99); 26 Feb 2013 15:16:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Feb 2013 15:16:14 +0000 Date: Tue, 26 Feb 2013 15:16:13 +0000 (UTC) From: "Thomas Jungblut (JIRA)" To: dev@hama.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HAMA-723) Implement sorting in spilling queue. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HAMA-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587187#comment-13587187 ] Thomas Jungblut commented on HAMA-723: -------------------------------------- Oh great. Personally I would totally rethink the messaging. You earlier mentioned the 16k buffer that is getting sorted using Quicksort. I think this is the way to go, we should materialize messages once they are send() to a DataOutputBuffer (there are N-buffers for each outgoing peer, lazily initialized), once a threshold is exceeded (with Hadoop RPC's overhead, I guess 4mb should be optimal?) we sort it, apply compression if defined and send it via RPC. The normal spilling queue works the same way, but without sorting. This implies we are removing the bundling which is okay in my opinion. The receiver side should know sorted segments are arriving and merge the data into a single file on local disk. In the same part we are adding asynchronous messaging, as when a buffer is exceeded the data goes over the wire. Also we should enforce that algorithms keep using the same message class as it makes it easier for us to keep a single instance and stop writing classnames the whole time. That is going to be a huge patch, should we chunk that? > Implement sorting in spilling queue. > ------------------------------------ > > Key: HAMA-723 > URL: https://issues.apache.org/jira/browse/HAMA-723 > Project: Hama > Issue Type: Sub-task > Components: bsp core > Reporter: Suraj Menon > Assignee: Edward J. Yoon > Fix For: 0.6.1, 0.7.0 > > > Implement sorted queue. The sender queue can send segments of sorted data and the receiver queue should implement merge sort. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira