mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peng Cheng <pc...@uowmail.edu.au>
Subject Re: [jira] [Commented] (MAHOUT-1286) Memory-efficient DataModel, supporting fast online updates and element-wise iteration
Date Tue, 23 Jul 2013 22:06:39 GMT
That's exactly what I'm trying to do right now :) (I'm testing 
FastByIDArrayMap), but we probably have more problems than just HashMap, 
based on the heap dump analysis result, PreferenceArray probably will be 
our next target. This is awesome, as your FactorizablePreferences didn't 
use it in the first place.

Yours Peng

On 13-07-23 05:46 PM, Sebastian Schelter wrote:
> IMHO you will always have memory issues if you try to provide constant time
> random access. Thats why I proposed to created a special memory efficient
> DataModel for sequential access.
>
>
> 2013/7/23 Peng Cheng (JIRA) <jira@apache.org>
>
>>      [
>> https://issues.apache.org/jira/browse/MAHOUT-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717659#comment-13717659]
>>
>> Peng Cheng commented on MAHOUT-1286:
>> ------------------------------------
>>
>> Aye aye, I just did, turns out that instances of
>> PreferenceArray$PreferenceView has taken 1.7G. Quite unexpected right?
>> Thanks a lot for the advice.
>> My next experiment will just use GenericPreference [] directly, there will
>> be no more PreferenceArray.
>>
>> Class Name
>>      |    Objects |  Shallow Heap |    Retained Heap
>>
>> -------------------------------------------------------------------------------------------------------------------------------
>> org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray$PreferenceView|
>> 72,237,632 | 1,733,703,168 | >= 1,733,703,168
>> long[]
>>      |    480,199 |   818,209,680 |   >= 818,209,680
>> float[]
>>       |    480,190 |   410,563,592 |   >= 410,563,592
>> java.lang.Object[]
>>      |     18,230 |   361,525,488 | >= 2,443,647,088
>> org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray
>>      |    480,189 |    15,366,048 | >= 1,237,456,672
>> java.util.ArrayList
>>       |     17,811 |       427,464 | >= 2,092,416,104
>> char[]
>>      |      2,150 |       272,632 |       >= 272,632
>> byte[]
>>      |        141 |        54,048 |        >= 54,048
>> java.lang.String
>>      |      2,119 |        50,856 |       >= 271,920
>> java.util.concurrent.ConcurrentHashMap$HashEntry
>>      |        673 |        21,536 |        >= 38,104
>> java.net.URL
>>      |        229 |        14,656 |        >= 40,720
>> java.util.HashMap$Entry
>>       |        344 |        11,008 |        >= 68,760
>>
>> -------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>> Memory-efficient DataModel, supporting fast online updates and
>> element-wise iteration
>> -------------------------------------------------------------------------------------
>>>                  Key: MAHOUT-1286
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-1286
>>>              Project: Mahout
>>>           Issue Type: Improvement
>>>           Components: Collaborative Filtering
>>>     Affects Versions: 0.9
>>>             Reporter: Peng Cheng
>>>             Assignee: Sean Owen
>>>    Original Estimate: 336h
>>>   Remaining Estimate: 336h
>>>
>>> Most DataModel implementation in current CF component use hash map to
>> enable fast 2d indexing and update. This is not memory-efficient for big
>> data set. e.g. Netflix prize dataset takes 11G heap space as a
>> FileDataModel.
>>> Improved implementation of DataModel should use more compact data
>> structure (like arrays), this can trade a little of time complexity in 2d
>> indexing for vast improvement in memory efficiency. In addition, any online
>> recommender or online-to-batch converted recommender will not be affected
>> by this in training process.
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>



Mime
View raw message