mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Tastify
Date Tue, 16 Jun 2009 01:18:49 GMT
Hierarchical modeling techniques work well on structures like this if you
have good resolution of your meta-data.  Resolving and disambiguating artist
and track names can be difficult unless you have total control over the
meta-data source.

The basic idea is that you model an artist as a distribution over "concept
space", which is just a fancy name for latent  variables you don't plan to
understnad.   then an album is sampled from the artists and is another
distribution and finally a track is sampled from the album.  This is similar
to the way that in LDA, documents and words are distributions over your
latent concept variables.  Specific meanings are chosen at each point in a
document and the word you observe is chosen based on the concept at that

Since you only observe which word appears in which document, you have to
reverse-engineer what the latent concepts might have been by getting a
compromise between the word and document distributions.

In your case, you have a simpler generative model, but similar techniques
should apply.

On Sat, Jun 13, 2009 at 8:53 AM, Karl Wettin <> wrote:

> I hope that some semi-sophisticated Album, Track and ArtistSimilarity can
> be used to improve the results.
> Perhaps it's a good idea to have Playlist, Album and Artist implemented as
> Item too.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message