mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Blechschmidt <Manuel.Blechschm...@gmx.de>
Subject Re: UUID based user IDs
Date Thu, 02 Aug 2012 07:45:30 GMT
Hi Matt,
when you are creating your preferences (normally about millions of preferences) from your
data you always have to convert the UUID to longs before you create them.

The given example is doing that always on the fly when the recommender process is started
and saves the mapping in memory.

When receiving recommendations you have to convert the long id back to the UUID:

...
                        List<RecommendedItem> items = recommender.recommend(thing2long.toLongID(personName),
10);
			for(RecommendedItem item : items) {
				recommendations.add(thing2long.toStringID(item.getItemID()));
			}
...

I would recommend that you just clone the example and play around with it. You can also run
the test cases with a debugger and have a look what is happening.

git clone git://github.com/ManuelB/facebook-recommender-demo.git
cd facebook-recommender-demo
mvn install
mvn embedded-glassfish:run 

The github project contains an eclipse configuration so it should be easily loadable.

/Manuel

On 02.08.2012, at 04:40, Matt Mitchell wrote:

> Thanks Manuel, that's very helpful. So you're saying I can just use
> MemoryIDMigrator, even after my preferences have bee created with UUID
> values? Or, should I create my preferences using the MemoryIDMigrator?
> 
> - Matt
> 
> 
> On Wed, Aug 1, 2012 at 8:49 PM, Manuel Blechschmidt
> <Manuel.Blechschmidt@gmx.de> wrote:
>> Hello Matt,
>> 
>> On 01.08.2012, at 22:40, Matt Mitchell wrote:
>> 
>>> Thanks Sean! That all makes sense. Would you mind recommended a
>>> hashing function for this? Is there something in Mahout I could use?
>> 
>> The following class uses an string to long mapping based on a MemoryIDMigrator:
>> 
>> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java
>> 
>> Internally mahout uses parts of the md5 hashes. Which can be fir example directly
expressed in SQL:
>> 
>> cast(conv(substring(md5([column name]), 1, 16),16,10) as signed)
>> 
>> Javadoc can be found here:
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/IDMigrator.html
>> 
>> /Manuel
>> 
>>> 
>>> - Matt
>>> 
>>> On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <srowen@gmail.com> wrote:
>>>> Yep, just hash to a long, from UUID or String or whatever. The occasional
>>>> collision does not cause a real problem. If you mix the tastes of two users
>>>> or items once in a billion times, the overall results will hardly be
>>>> different.
>>>> 
>>>> You have to maintain the reverse mapping of course. Look at the IDMigrator
>>>> class for a little help there.
>>>> 
>>>> You can rewrite to use UUID or String, but believe me, it will be an
>>>> immense amount of change and make things much slower. It used to work this
>>>> way for recommenders in about 2006 and the Object overhead and GC pressure
>>>> was by far the bottleneck. That's why it's all long now.
>>>> 
>>>> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <goodieboy@gmail.com>
wrote:
>>>> 
>>>>> Question about dealing with UUIDs as Mahout user IDs. I'm considering
>>>>> ways to deal with these values:
>>>>> 
>>>>> 1. use getLeastSignificantBits
>>>>> 2. re-map to a database auto-increment number (this would take very
>>>>> long time to do?)
>>>>> 3. customize mahout so that it accepts UUIDs as user IDs
>>>>> 
>>>>> Any feedback here? If I went with #3 (seems the safest) how would I do
>>>>> this and, what are the consequences?
>>>>> 
>>>>> The user count is in the millions.
>>>>> 
>>>>> Thanks!
>>>>> 
>> 
>> --
>> Manuel Blechschmidt
>> M.Sc. IT Systems Engineering
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>> 

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B


Mime
View raw message