spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <>
Subject [jira] [Commented] (SPARK-7075) Project Tungsten: Improving Physical Execution and Memory Management
Date Wed, 29 Apr 2015 23:21:06 GMT


Reynold Xin commented on SPARK-7075:

Yup I will post more thoughts and plans in the next few days.

> Project Tungsten: Improving Physical Execution and Memory Management
> --------------------------------------------------------------------
>                 Key: SPARK-7075
>                 URL:
>             Project: Spark
>          Issue Type: Epic
>          Components: Block Manager, Shuffle, Spark Core, SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
> Based on our observation, majority of Spark workloads are not bottlenecked by I/O or
network, but rather CPU and memory. This project focuses on 3 areas to improve the efficiency
of memory and CPU for Spark applications, to push performance closer to the limits of the
underlying hardware.
> 1. Memory Management and Binary Processing: leveraging application semantics to manage
memory explicitly and eliminate the overhead of JVM object model and garbage collection
> 2. Cache-aware computation: algorithms and data structures to exploit memory hierarchy
> 3. Code generation: using code generation to exploit modern compilers and CPUs
> Several parts of project Tungsten leverage the DataFrame model, which gives us more semantics
about the application. We will also retrofit the improvements onto Spark’s RDD API whenever

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message