mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Burba <mike.bu...@gmail.com>
Subject Re: Should I be using OnlineLogisticRegression?
Date Fri, 07 Sep 2012 15:23:08 GMT
Took some massaging to get into R.  Here is the output as requested
for the 6 predictor variables:

> summary(x)

 v1 v2
 Min.   :   -55.0         Min.   :    0.0
 1st Qu.:     6.0         1st Qu.:    0.0
 Median :    62.0         Median :    2.0
 Mean   :   658.7         Mean   :   25.4
 3rd Qu.:   391.0         3rd Qu.:   13.0
 Max.   :461311.0         Max.   :21532.0

v3           v4
Min.   :0.000e+00   Min.   :  3.00
1st Qu.:1.821e+06   1st Qu.: 36.00
 Median :1.268e+07   Median : 47.00
 Mean   :2.345e+07   Mean   : 50.35
3rd Qu.:3.364e+07   3rd Qu.: 62.00
  Max.   :5.820e+10   Max.   :257.00

   v5         v6
 Min.   :    0.0   Min.   :1.000
 1st Qu.:  356.0   1st Qu.:2.000
 Median :  623.0   Median :3.000
 Mean   :  956.7   Mean   :2.862
 3rd Qu.: 1100.0   3rd Qu.:4.000
 Max.   :33413.0   Max.   :5.000

So now I am going through the process of transforming / scaling.  Any
top-of-mind thoughts on the output above are welcome...to help me
validate my thought process.

Thanks for the hints, I will let you know how it turns out.

Mike

On Thu, Sep 6, 2012 at 8:14 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> Try transforming them as well, likely with a log if they are positive and
> have heavily skewed values.
>
> Can you suck the data into R and paste in the results of summary(x)?
> (assuming you put the data into the variable x).  This should look
> something like:
>
> > summary(x)
> >        v1                 v2                  v3
> >  Min.   :-3.41939   Min.   :0.0002538   Min.   :1.188
> >  1st Qu.:-0.66695   1st Qu.:0.3122501   1st Qu.:3.321
> >  Median :-0.07277   Median :0.6830144   Median :3.972
> >  Mean   :-0.05619   Mean   :1.0286261   Mean   :4.010
> >  3rd Qu.: 0.56784   3rd Qu.:1.4619058   3rd Qu.:4.712
> >  Max.   : 2.74271   Max.   :7.7754864   Max.   :7.252
> > >
>
>
> On Thu, Sep 6, 2012 at 4:58 PM, Diederik van Liere <
> Diederik.vanLiere@rotman.utoronto.ca> wrote:
>
> >
> > > - My (6) predictor variables are all numeric; some of the variables range
> > > from 0...5, others range from 0...1,000,000.
> > Have you tried rescaling your predictor variables so they have the same
> > range?
> >
> > Diederik

Mime
View raw message