mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chantal Ackermann <>
Subject Re: Cooccurrence to align different categorization systems (many to many occurrence)
Date Mon, 19 Jul 2010 08:27:32 GMT
Hi Sean,
hi Ted,
hi Sebastian,

thanks a lot for all those detailed answers. I'll need some time to
digest the technical details, I'm afraid. I find Sean's suggestion on
thinking of categories as users and using the recommendation classes for
the task the easiest to understand, right now.

It's not completely the same situation, though. Or only if thinking of
two user communities, and the recommendations presented to a user of
Community 1 should be from Community 2.

Each item is categorized in each of the systems but it's allowed that
the item can have zero categories. There are a few hundred categories in
each system.

The data is in lists of the following structure:
<ITEM (ID)> [List of categories System 1] [List of categories System 2]

The approach I'll take:
1. normalize all the cateogory strings and give them unique number
identifiers (unique across both systems, distinct ranges).
2. walk trough the list and per item: extract one category (= user) and
create a BooleanPreference for that user and item pair.
3. for each category (System 1) request similar categories (=user
similarity) that are from System 2. I probably have to request a mixed
list (both systems) and filter out the ones from System 1.

I'll keep you posted. If you have more tipps or things I should take
into account - or if you think that this approach won't return any
decent results I'm glad if you could share.


View raw message