mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil" <robin.a...@gmail.com>
Subject Re: CNB: Learning from Huge Datasets
Date Thu, 31 Jul 2008 20:06:00 GMT
On Fri, Aug 1, 2008 at 1:30 AM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Jul 31, 2008, at 3:31 PM, Robin Anil wrote:
>
>> Right now I have made sure that the output of the Trainer creates some
>>>> values. Which are then used in the final CBayes equation calculated in
>>>> the
>>>> getWeight(feature, label) function. The only thing i need to modify it
>>>> to
>>>> be
>>>> a Bayes classifier with all the
>>>> weight and length normalization except Cbayes complexity is just a few
>>>> code
>>>> change in the Model.
>>>>
>>>> Should i go ahead and change the Bayes classifier in the Next Patch
>>>>
>>>>
>>> +1  Do what you need to do.  What do you think about moving towards a
>>> matrix model, though, and using the proposed Matrix labels?
>>>
>>
>>
>> I did not get this part. How are we going to make it as a Matrix Model? Is
>> it explained in the other threads?
>>
>
>
> Well, we do all this string concatenation to build up the model, just seems
> like we could work off of a matrix, but no worries, let's get in what we
> have and then we can work on optimizing, etc.
>
Yes That makes a lot of sense. But for that a common set of readers for
Text, Real Numbers and Image have to be written.

>
> I think once we get this in and maybe the ARFF reader, then we can put out
> a 0.1 release, which should get us some users and a bit more feedback, etc.
>  In these early stages, I think it is important to not be too tied to any
> particular way of doing it, as most of us are learning a lot as we go
> (except maybe Ted, who is an ML guru...)

+1
Expect my patch in the next one hour. (including just the changed Bayes
Fileformatter)

>
>
> -Grant
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message