mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Santoshi <vishal.santo...@gmail.com>
Subject Re: Detecting high bias and variance in AdaptiveLogisticRegression classification
Date Thu, 20 Feb 2014 16:49:26 GMT
Hey Ted,

>> I presume that you would like  Adagrad-like solution to replace the
above ?

Things that I could glean out.




 *  Maintain a simple d-dimensional vector representing to store a running
total of the squares of the gradients, where d is the number of terms.  Say
*gradients*.




*  Based on

     "Since the learning rate for each feature is quickly adapted, the
value for  is far less important than it is with SGD. I have used  = 1:0
for a very large number of different problems. The primary role of
     is to determine how much a feature changes the very first time it is
encountered, so in problems with large numbers of extremely rare features,
some additional care may be warranted."

     *How important or even necessary is  perTermLearningRate(j)  ?*




*  double newValue = beta.getQuick(i, j) + gradientBase * learningRate *
perTermLearningRate(j) * instance.get(j);

   becomes

    double newGradient = beta.getQuick(i, j) + ( learningRate / Math.sqrt(
*gradients*(i)) )* instance.get(j);

    *gradients*(i)  = *gradients*(i) + newGradient ^2;





Does this make sense ? The only thing is that the abstract class changes.


Regards.




On Sun, Dec 29, 2013 at 8:45 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> :-)
>
> Many leaks are *very* subtle.
>
> One leak that had me going for weeks was in a news wire corpus.  I couldn't
> figure out why the cross validation was so good and running the classifier
> on new data was soooo much worse.
>
> The answer was that the training corpus had near-duplicate articles.  This
> means that there was leakage between the training and test corpora.  This
> wasn't quite a target leak, but it was a leak.
>
> For target leaks, it is very common to have partial target leaks due to the
> fact that you learn more about positive cases after the moment that you had
> to select which case to investigate.  Suppose, for instance you are
> targeting potential customers based on very limited information.  If you
> make an enticing offer to the people you target, then those who accept the
> offer will buy something from you.  You will also learn some particulars
> such as name and address from those who buy from you.
>
> Looking retrospectively, it looks like you can target good customers who
> have names or addresses that are not null.  Without a good snapshot of each
> customer record at exactly the time that the targeting was done, you cannot
> know that *all* customers have a null name and address before you target
> them.  This sort of time machine leak can be enormously more subtle than
> this.
>
>
>
> On Mon, Dec 2, 2013 at 1:50 PM, Gokhan Capan <gkhncpn@gmail.com> wrote:
>
> > Gokhan
> >
> >
> > On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi <
> > > vishal.santoshi@gmail.com>
> > >
> > > >
> > > >
> > > > Are we to assume that SGD is still a work in progress and
> > > implementations (
> > > > Cross Fold, Online, Adaptive ) are too flawed to be realistically
> used
> > ?
> > > >
> > >
> > > They are too raw to be accepted uncritically, for sure.  They have been
> > > used successfully in production.
> > >
> > >
> > > > The evolutionary algorithm seems to be the core of
> > > > OnlineLogisticRegression,
> > > > which in turn builds up to Adaptive/Cross Fold.
> > > >
> > > > >>b) for truly on-line learning where no repeated passes through
the
> > > data..
> > > >
> > > > What would it take to get to an implementation ? How can any one
> help ?
> > > >
> > >
> > > Would you like to help on this?  The amount of work required to get a
> > > distributed asynchronous learner up is moderate, but definitely not
> huge.
> > >
> >
> > Ted, do you describe a generic distributed learner for all kinds of
> online
> > algorithms? Possibly zookeeper-coordinated and with #predict and
> > #getFeedbackAndUpdateTheModel methods?
> >
> > >
> > > I think that OnlineLogisticRegression is basically sound, but should
> get
> > a
> > > better learning rate update equation.  That would largely make the
> > > Adaptive* stuff unnecessary, expecially if OLR could be used in the
> > > distributed asynchronous learner.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message