tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Eagles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead
Date Fri, 26 Feb 2016 16:28:18 GMT

    [ https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169266#comment-15169266
] 

Jonathan Eagles commented on TEZ-3115:
--------------------------------------

Patch 2 summary.

- Host and attempt are now the fundamental storage types. Created several subtypes that allow
us to intern host and path component immediately after processing the DataMovementEvent. This
allows us to not only reduce down to one copy not only exact strings, but the string derivatives
(host -> host, host-port, host-port-partition), (path component -> path component, path
component-partition). There are a few non-string handling scenarios that still need improvements
(extremely large auto-reduce parallelism, and large number of empty partitions). Filed TEZ-3144
and TEZ-3145 to address those scenarios.

> Shuffle string handling adds significant memory overhead
> --------------------------------------------------------
>
>                 Key: TEZ-3115
>                 URL: https://issues.apache.org/jira/browse/TEZ-3115
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the ShuffleManager
and other shuffle-related objects were holding onto many strings that added up to over a hundred
megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message