mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Logistic Regression Tutorial
Date Thu, 28 Apr 2011 21:07:01 GMT
THanks, all. I'm get frustrated really fast when trying to read a PDF.
I guess I'm a fossil.

On Thu, Apr 28, 2011 at 4:54 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> The TrainNewsGroups class does this not quite as nicely as is possible (it
> avoids the TextValueEncoder).
>
> I will post a simplified example on github that I just worked up for RCV1.
>
>
>
> On Thu, Apr 28, 2011 at 1:32 PM, Chris Schilling <chris@cellixis.com> wrote:
>
>> Benson,
>>
>> Chapter 14 and 15 discuss the 20 newsgroups classification example using
>> bad-of-words.  In this implementation of LR, you have to manually create the
>> feature vectors when iterating through the files.  The features are hashed
>> into a vector of predetermined length.  The examples are very clear and easy
>> to setup.  I can send you some code I wrote for a similar problem if it will
>> help.
>>
>> Chris
>>
>> On Apr 28, 2011, at 1:24 PM, Benson Margulies wrote:
>>
>> > Chris,
>> >
>> > I'm looking a recently-purchased MIA.
>> >
>> > The LR example is all about the donut file, which has features that
>> > don't look anything like, even remotely, a full-up bag-of-words
>> > vector.
>> >
>> > I'm lacking the point of connection between the vectorization process
>> > (which we have some experience here with running canopy/kmeans) and
>> > the LR example. It's probably some simple principle that I'm failing
>> > to grasp.
>> >
>> > --benson
>> >
>> >
>> > On Thu, Apr 28, 2011 at 4:02 PM, Chris Schilling <chris@cellixis.com>
>> wrote:
>> >> Benson,
>> >>
>> >> The latest chapters in Mahout in Action cover document classification
>> using LR very well.
>> >>
>> >> Chris
>> >>
>> >>
>> >> On Apr 28, 2011, at 12:55 PM, Benson Margulies wrote:
>> >>
>> >>> Mike,
>> >>>
>> >>> in the time available for the experiment I want to perform, all I can
>> >>> imagine doing is turning each document into a bag-of-words feature
>> >>> vector. So, I want to run the pipeline of lucene->vectors->...
and
>> >>> train a model. I confess that I don't have the time to try to absorb
>> >>> the underlying math, indeed, I have some co-workers who can help me
>> >>> with that. My problem is entirely plumbing at this point.
>> >>>
>> >>> --benson
>> >>>
>> >>>
>> >>> On Thu, Apr 28, 2011 at 3:52 PM, Mike Nute <mike.nute@gmail.com>
>> wrote:
>> >>>> Benson,
>> >>>>
>> >>>> Lecture 3 in this one is a good intro to the logit model:
>> >>>>
>> >>>>
>> http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
>> >>>>
>> >>>> The lecture notes are pretty solid too so that might be faster.
>> >>>>
>> >>>> The short version: Logistic Regression is a GLM with the link f^-1(x)
>> =
>> >>>> 1/(1+e^(xB)) and a Binomial likelihood function.  You can
>> alternatively use
>> >>>> Batch or Stochastic Gradient Descent.
>> >>>>
>> >>>> I've never done document classification before though, so I'm not
much
>> help
>> >>>> with more complicated things like choosing the feature vector.
>> >>>>
>> >>>> Good Luck,
>> >>>> Mike Nute
>> >>>>
>> >>>> On Thu, Apr 28, 2011 at 3:35 PM, Benson Margulies <
>> bimargulies@gmail.com>wrote:
>> >>>>
>> >>>>> Is there a logistic regression tutorial in the house? I've got
a
>> stack
>> >>>>> of files (Arabic ones, no less) and I want to train and score
a
>> >>>>> classifier.
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Michael Nute
>> >>>> Mike.Nute@gmail.com
>> >>>>
>> >>
>> >>
>>
>>
>

Mime
View raw message