hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5605) Memory-centric MapReduce aiming to solve the I/O bottleneck
Date Mon, 04 Nov 2013 11:33:17 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Chen updated MAPREDUCE-5605:
---------------------------------

    Description: Memory is a very important resource to bridge the gap between CPUs and I/O
devices. So the idea is to maximize the usage of memory to solve the problem of I/O bottleneck.
We developed a multi-threaded task execution engine, which runs in a single JVM on a node.
In the execution engine, we have implemented the algorithm of memory scheduling to realize
global memory management, based on which we further developed the techniques such as sequential
disk accessing, multi-cache and solved the problem of full garbage collection in the JVM.
The benchmark results shows that it can get impressive improvement in typical cases. When
the a system is relatively short of memory (eg, HPC, small- and medium-size enterprises),
the improvement will be even more impressive.  (was: Memory is a very important resource to
bridge the gap between CPUs and I/O devices. So the idea is to maximize the usage of memory
to solve the problem of I/O bottleneck. We developed a multi-threaded task execution engine,
which runs in a single JVM on a node. In the execution engine, we have implemented the algorithm
of memory scheduling to realize global memory management, based on which we further developed
the techniques such as sequential disk accessing, multi-cache and solved the problem of full
garbage collection in the JVM. We have conducted extensive experiments with comparison against
the native Hadoop platform. The results show that the Mammoth system can reduce the job execution
time by more than 40% in typical cases, without requiring any modifications of the Hadoop
programs. When a system is short of memory, Mammoth can improve the performance by up to 4
times, as observed for I/O intensive applications, such as PageRank. )

> Memory-centric MapReduce aiming to solve the I/O bottleneck
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5605
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5605
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 1.0.1
>         Environment: x86-64 Linux/Unix
> jdk7 preferred
>            Reporter: Ming Chen
>            Assignee: Ming Chen
>         Attachments: MAPREDUCE-5605-v1.patch
>
>
> Memory is a very important resource to bridge the gap between CPUs and I/O devices. So
the idea is to maximize the usage of memory to solve the problem of I/O bottleneck. We developed
a multi-threaded task execution engine, which runs in a single JVM on a node. In the execution
engine, we have implemented the algorithm of memory scheduling to realize global memory management,
based on which we further developed the techniques such as sequential disk accessing, multi-cache
and solved the problem of full garbage collection in the JVM. The benchmark results shows
that it can get impressive improvement in typical cases. When the a system is relatively short
of memory (eg, HPC, small- and medium-size enterprises), the improvement will be even more
impressive.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message