mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Why do userid & itemid have to be long?
Date Wed, 01 Jun 2011 05:13:08 GMT
Could it be doing lots of garbage collection? Have you monitored the
JVMs while this takes so long?
Also, you can tell HashMap you'll be adding lots of entries when it
starts. This might help it run faster. But, yes, this is bizarrely
slow.

On Tue, May 31, 2011 at 10:08 PM, Chris Schilling
<chris@thecleversense.com> wrote:
> I have a test set of 6M preferences (500k users, 500k items).  I recently
> switched my infrastructure to use Long sequential ids for users and items.
> Before this we were using Strings.  I was able to read in a map file for
> userIds and itemIds into a Java HashMap.  Conversions took negligible amount
> of time.  This sounds insance for only 5M prefs.
>
>
>
> On Tue, May 31, 2011 at 9:51 PM, Mike Khristo <mikekhristo@gmail.com> wrote:
>
>> Using the 0.6 snapshot + patch 705 (mongodatamodel) from jira (
>> https://issues.apache.org/jira/browse/MAHOUT-705), and a test data set
>> with
>> ~300k rows like:
>>
>> "4cec0a2934ac9fbd2b040000","4d065d5434ac9f5227a12f00",118
>>
>> It's slowly doing the translations:
>> INFO: [+++][MONGO-MAP] Adding Translation    Item ID:
>> 4d57d54434ac9fd3570005a2 long_value: 145367
>>
>> It's doing about 30,000 per hour (and getting slower). That's 8.3/sec.
>> 8G ram, 4 virtual cores
>>
>> With a test data set of 3M preferences, that would take >5 days, just for
>> the translation.
>>
>> Open to ideas/suggestions/"a-ha"-moments. Thanks!
>>
>>
>>
>>
>> On Tue, May 31, 2011 at 9:15 PM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>>
>> > It makes the internals much cleaner to not repeat this conversion.
>> >
>> > But how is it that this is taking a long time?  String -> lookup should
>> not
>> > be much longer than an array access, especially if you use the Mahout
>> > collections or one of the dictionary types.
>> >
>> > On Tue, May 31, 2011 at 7:50 PM, Mike Khristo <mikekhristo@gmail.com>
>> > wrote:
>> >
>> > > Rather, how can I use string-based userid/itemid's without having the
>> > deal
>> > > with the slowness associated with mapping them to a long?
>> > >
>> > > In the MongoDataModel, for example, significant time/overhead goes into
>> > > converting the unique id's to long...  I'm still getting my head
>> wrapped
>> > > around mahout, but this seems like a significant limitation. I have to
>> > > assume there's some logic behind the decision to restrict them to long,
>> > but
>> > > i didn't find anything about it in Mahout in Action or the list.
>> > >
>> > > Thanks.
>> > >
>> >
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message