hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miomir Boljanovic (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4303) Look at using String.intern to dedupe some Strings
Date Sat, 29 Sep 2012 12:42:09 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466206#comment-13466206

Miomir Boljanovic commented on MAPREDUCE-4303:

Hi Robert,

A patch attempts to address similar issue is available, MAPREDUCE-4229 (waiting for review)
It's based on Guava's Interner implementation, which provides equivalent behavior to String.intern()
but doesn't  consume memory in permanent storage.
Perhaps you could take a quick look at it, any feedback is appreciated?

> Look at using String.intern to dedupe some Strings
> --------------------------------------------------
>                 Key: MAPREDUCE-4303
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4303
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>    Affects Versions: 0.23.3, 2.0.0-alpha
>            Reporter: Robert Joseph Evans
> MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places
where it is not as simple to remove the duplicates.  In these cases the source of the strings
is an incoming RPC call or from parsing and reading in a file.  The only real way to dedupe
these is to either use String.intern() which if not used properly could result in the permgen
space being filled up, or by playing games with our own cache, and trying to do the same sort
of thing as String.intern, but in the heap.
> The following are some that I saw lots of duplicate strings that we should look at doing
something about.
> TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
> MapTaskAttemptImpl.diagnostics
> The keys to Counters.groups
> GenericGroup.displayName
> The keys to GenericGroup.counters
> and GenericCounter.displayName

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message