mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Regarding Online Recommenders
Date Wed, 17 Jul 2013 20:58:17 GMT
Hi Peng,

I never wanted to discard the old interface, I just wanted to split it up.
I want to have a simple interface that only supports sequential access (and
allows for very memory efficient implementions, e.g. by the use of
primitive arrays). DataModel should *extend* this interface and provide
sequential and random access (basically what is already does).

Than a recommender such as SGD could state that it only needs sequential
access to the preferences and you can either feed it a DataModel (so we
don"t break backwards compatibility) or a memory efficient sequential
access thingy.

Does that make sense for you?


2013/7/17 Peng Cheng <pc175@uowmail.edu.au>

> I see, OK so we shouldn't use the old implementation. But I mean, the old
> interface doesn't have to be discarded. The discrepancy between your
> FactorizablePreferences and DataModel is that, your model supports
> getPreferences(), which returns all preferences as an iterator, and
> DataModel supports a few old functions that returns preferences for an
> individual user or item.
>
> My point is that, it is not hard for each of them to implement what they
> lack of: old DataModel can implement getPreferences() just by a a loop in
> abstract class. Your new FactorizablePreferences can implement those old
> functions by a binary search that takes O(log n) time, or an interpolation
> search that takes O(log log n) time in average. So does the online update.
> It will just be a matter of different speed and space, but not different
> interface standard, we can use old unit tests, old examples, old
> everything. And we will be more flexible in writing ensemble recommender.
>
> Just a few thoughts, I'll have to validate the idea first before creating
> a new JIRA ticket.
>
> Yours Peng
>
>
>
> On 13-07-16 02:51 PM, Sebastian Schelter wrote:
>
>> I completely agree, Netflix is less than one gigabye in a smart
>> representation, 12x more memory is a nogo. The techniques used in
>> FactorizablePreferences allow a much more memory efficient representation,
>> tested on KDD Music dataset which is approx 2.5 times Netflix and fits
>> into
>> 3GB with that approach.
>>
>>
>> 2013/7/16 Ted Dunning <ted.dunning@gmail.com>
>>
>>  Netflix is a small dataset.  12G for that seems quite excessive.
>>>
>>> Note also that this is before you have done any work.
>>>
>>> Ideally, 100million observations should take << 1GB.
>>>
>>> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng <pc175@uowmail.edu.au>
>>> wrote:
>>>
>>>  The second idea is indeed splendid, we should separate time-complexity
>>>> first and space-complexity first implementation. What I'm not quite
>>>> sure,
>>>> is that if we really need to create two interfaces instead of one.
>>>> Personally, I think 12G heap space is not that high right? Most new
>>>>
>>> laptop
>>>
>>>> can already handle that (emphasis on laptop). And if we replace hash map
>>>> (the culprit of high memory consumption) with list/linkedList, it would
>>>> simply degrade time complexity for a linear search to O(n), not too bad
>>>> either. The current DataModel is a result of careful thoughts and has
>>>> underwent extensive test, it is easier to expand on top of it instead of
>>>> subverting it.
>>>>
>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message