mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Why do userid & itemid have to be long?
Date Wed, 01 Jun 2011 05:07:11 GMT
UserID and ItemID are usually domain-level keys, not generated by the
DB. With some of the movie databases, you get tables of
"user/item/pref/time", "item/moviename/genre", and maybe
"user/geocode".

Lance

On Tue, May 31, 2011 at 9:51 PM, Mike Khristo <mikekhristo@gmail.com> wrote:
> Using the 0.6 snapshot + patch 705 (mongodatamodel) from jira (
> https://issues.apache.org/jira/browse/MAHOUT-705), and a test data set with
> ~300k rows like:
>
> "4cec0a2934ac9fbd2b040000","4d065d5434ac9f5227a12f00",118
>
> It's slowly doing the translations:
> INFO: [+++][MONGO-MAP] Adding Translation    Item ID:
> 4d57d54434ac9fd3570005a2 long_value: 145367
>
> It's doing about 30,000 per hour (and getting slower). That's 8.3/sec.
> 8G ram, 4 virtual cores
>
> With a test data set of 3M preferences, that would take >5 days, just for
> the translation.
>
> Open to ideas/suggestions/"a-ha"-moments. Thanks!
>
>
>
>
> On Tue, May 31, 2011 at 9:15 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> It makes the internals much cleaner to not repeat this conversion.
>>
>> But how is it that this is taking a long time?  String -> lookup should not
>> be much longer than an array access, especially if you use the Mahout
>> collections or one of the dictionary types.
>>
>> On Tue, May 31, 2011 at 7:50 PM, Mike Khristo <mikekhristo@gmail.com>
>> wrote:
>>
>> > Rather, how can I use string-based userid/itemid's without having the
>> deal
>> > with the slowness associated with mapping them to a long?
>> >
>> > In the MongoDataModel, for example, significant time/overhead goes into
>> > converting the unique id's to long...  I'm still getting my head wrapped
>> > around mahout, but this seems like a significant limitation. I have to
>> > assume there's some logic behind the decision to restrict them to long,
>> but
>> > i didn't find anything about it in Mahout in Action or the list.
>> >
>> > Thanks.
>> >
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message