mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Rewrite of CBayes classifier
Date Sat, 25 Sep 2010 19:46:45 GMT
Rewrite Question

A key thing that improves accuracy of naivebayes over text is the
normalization over TF Vector (V)

new V_i = Log(1 + V_i) / SQRT(Sigma_k(V_k));

AbstractVector already does L_p norm, does it make sense to add one function
to do the above normalization? Say logNormalize(double x). I will be adding
this to PartialVector Merger (in DictionaryVectorizer). So two choices, I
can do this in the Vectorizer or the Vectorizer can call this function ?



Robin


On Sat, Sep 25, 2010 at 10:22 PM, Sean Owen <srowen@gmail.com> wrote:

> I think it's fine to do a rewrite at this stage. 0.5 sounds like a
> nice goal. Just recall that aspects of this will be 'in print' soon so
> yeah you want to a) plan to deprecate rather than remove the old code
> for some time, b) make the existing code "forwards compatible" with
> what you'll do next while you have the chance!
>
> On Sat, Sep 25, 2010 at 2:32 PM, Robin Anil <robin.anil@gmail.com> wrote:
> > Hi, I was in the middle of changing the classifier over to to vectors and
> I
> > realized how radically it will change for people using it and how
> difficult
> > it is to fit the new interfaces ted checked it. There are many components
> to
> > it, including the Hbase stuff, which will take a lot of time to port. I
> > think its best to start from scratch rewrite it, keeping the old version
> so
> > that it wont break for users using it?. If that is agreeable, I can
> complete
> > a new map/reduce + imemory classifier in o.a.m.c.naivebayes fitting the
> > interfaces and deprecate the old bayes package?. The new package wont
> have
> > the full set of features as the old for 0.4 release. But it will be
> > functional, and hopefully future proof.  Let me know your thoughts
> >
> > Robin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message