mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-154) Reduce memory usage with smarter data structures
Date Fri, 31 Jul 2009 14:20:15 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737532#action_12737532
] 

Ted Dunning commented on MAHOUT-154:
------------------------------------

.bq use float, not double, for preference values. It is terribly unlikely that a float (4
bytes) is not enough precision to accurately represent user preferences, which are typically
like "3.0" or "4.5".

I haven't seen a situation yet with even two sig figs.  A byte should suffice.

.bq Preference[] is an inefficient way to store prefs, since it entails a great deal of Preference
object overhead

Indeed.  A container class that reconstructs will help by turning static usage into ephemeral
memory usage.

.bq So far these changes have reduced memory requirements by about 20% in my particular test
case, which is significant.

Significant, yes.  But surprisingly small.  I would have expected the raw data to be the vast
majority of your memory since all downstream measures should be aggregations of that. 

Where is the rest of the memory going?  Is it possible that the results should (can?) be computed,
written and forgotten?

> Reduce memory usage with smarter data structures
> ------------------------------------------------
>
>                 Key: MAHOUT-154
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-154
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>             Fix For: 0.2
>
>
> Memory usage remains an issue. This issue tracks two changes with API implications that
could reduce memory requirements:
> - use float, not double, for preference values. It is terribly unlikely that a float
(4 bytes) is not enough precision to accurately represent user preferences, which are typically
like "3.0" or "4.5". Using float instead of an 8-byte double saves 4 bytes per preference
value, which is significant when loading tens of millions of prefs into memory
> - Preference[] is an inefficient way to store prefs, since it entails a great deal of
Preference object overhead (48 bytes per pref is needed, of which 36 is overhead (!)) Using
an abstraction like PreferenceArray which can use parallel arrays internally can cut at least
12 of the 36 bytes of overhead out -- more if crazier data structures are used.
> So far these changes have reduced memory requirements  by about 20% in my particular
test case, which is significant.
> I am tracking this as an issue since like MAHOUT-151 it will entail API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message