Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18FE21009E for ; Mon, 4 Nov 2013 06:55:35 +0000 (UTC) Received: (qmail 60653 invoked by uid 500); 4 Nov 2013 06:55:27 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 60490 invoked by uid 500); 4 Nov 2013 06:55:26 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 60322 invoked by uid 99); 4 Nov 2013 06:55:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Nov 2013 06:55:25 +0000 Date: Mon, 4 Nov 2013 06:55:25 +0000 (UTC) From: "Ming Chen (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Chen updated MAPREDUCE-5605: --------------------------------- Attachment: (was: OutputFormat.java) > Memory-centric MapReduce aiming to solve the I/O bottleneck > ----------------------------------------------------------- > > Key: MAPREDUCE-5605 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 1.0.1 > Environment: x86-64 Linux/Unix > jdk7 preferred > Reporter: Ming Chen > Assignee: Ming Chen > Attachments: ReduceTask.java, ReduceTaskRunner.java, ReduceTaskStatus.java, ReinitTrackerAction.java, RoundQueue.java, RunningJob.java, SequenceFileOutputFormat.java, SpillScheduler.java, Task.java > > > Memory is a very important resource to bridge the gap between CPUs and I/O devices. So the idea is to maximize the usage of memory to solve the problem of I/O bottleneck. We developed a multi-threaded task execution engine, which runs in a single JVM on a node. In the execution engine, we have implemented the algorithm of memory scheduling to realize global memory management, based on which we further developed the techniques such as sequential disk accessing, multi-cache and solved the problem of full garbage collection in the JVM. We have conducted extensive experiments with comparison against the native Hadoop platform. The results show that the Mammoth system can reduce the job execution time by more than 40% in typical cases, without requiring any modifications of the Hadoop programs. When a system is short of memory, Mammoth can improve the performance by up to 4 times, as observed for I/O intensive applications, such as PageRank. -- This message was sent by Atlassian JIRA (v6.1#6144)