mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: CardinalityException in DirichletDriver
Date Tue, 19 Jan 2010 23:16:22 GMT
Hi Ted,

Ok, from this and looking at your code here is what I get:

L1Model has a single, sparse coefficient vector M[t] where each 
coefficient is the probability of that term being present in the model. 
As (TF-IDF?) data values X[t] are scanned the pdf(X) for each model 
would be exp(- ManhattanDistanceMeasure(M, X)). The list of pdfs times 
the mixture probabilities is then sampled as a multinomial which selects 
a particular model from the list of available models. When the model 
then observes(X[t]), M=M+X and a count of observed values is 
incremented. When computeParameters() is called, presumably M is 
normalized (regularized?) and then sampled somehow to become the 
posterior model for the next iteration.

L1ModelDistribution needs to compute a list of models from its prior and 
posterior distributions. What is known about each prior model? M[t] 
should have some non-zero coefficients but we don't know which ones? 
Seems like we could pick a few at random. Even if they are all identical 
with empty Ms, the multinomial will still force the data values into 
different models and, after the iteration is over, the models will all 
be different and will diverge from each other as they (hopefully) 
converge upon a description of the corpus. That's a little like what 
kMeans does with random initial clusters and how Dirichlet works with 
NormalModelDistributions (all prior models are identical with zero mean 

This has a lot of question marks in it but I'm pressing send anyhow,

Ted Dunning wrote:
> On Tue, Jan 19, 2010 at 10:58 AM, Jeff Eastman
> <>wrote:
>> Looking in MAHOUT-228-3.patch, I don't see any sparse vectorizer. Did you
>> have another patch in mind?
> There should have been one.  Let me check to figure out the name.
>> I'm trying to wrap my mind around "L-1 model distribution".
> For the classifier learning, what we have is a prior distribution for
> classifiers that has probability proportional to exp(- sum(abs(w_i))).  The
> log of this probability is - sum(abs(w_i)) = L_1(w) which gives the name.
> This log probability is what is used as a regularization term in the
> optimization of the classifier.
> It isn't obvious from this definition, but this prior/regularizer has the
> effect of preferring sparse models (for classification).  Where L_2 priors
> prefer lots of small weights in ambiguous conditions because the penalty on
> large coefficients is so large, L_1 priors prefer to focus the weight on one
> or a few larger coefficients.
>> .... Would an L-1 model vector only have integer-valued elements?
> In the sense that 0 is an integer, yes.  :-)
> But what it prefers is zero valued coefficients.

View raw message