mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: rate option of trainLogistic command
Date Fri, 21 Sep 2012 15:05:45 GMT
This changes the initial learning rate.  CHanging this can definitely
change convergence properties.

On Fri, Sep 21, 2012 at 9:33 AM, Watson Watson <watsonww@gmail.com> wrote:

> Hi,
> My question is why changing the rate parameter we always change the
> coefficients (results of RunLogistic)?
>
> I encounter the enigmatic impact of changing rates on my own data example,
> but since the concern can be reproduced with simple exampe from MIA book,
> I'll use it to formulate my doubts:
> (example of running RunLogistic exactly as in book and with other rate
> parameter values, 50, 40, 60, 500 and 50000 respectively)
> [banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
> --input donut.csv --output donut.model --target color --categories 2
> --predictors x y a b c --types numeric --features 20 --passes 100 --rate 50
> 20
> color ~ 7.068*Intercept Term + 0.581*a + -1.369*b + -25.059*c + 0.581*x +
> 2.319*y
>       Intercept Term 7.06759
>                    a 0.58123
>                    b -1.36893
>                    c -25.05945
>                    x 0.58123
>                    y 2.31879
>     0.000000000     0.000000000     0.000000000     0.000000000
> 0.000000000    -1.368933989     0.000000000     0.000000000
> 0.000000000     0.000000000     0.581234210     0.000000000
> 0.000000000     7.067587159     0.000000000     0.000000000
> 0.000000000     2.318786209     0.000000000   -25.059452292
> 12/09/19 11:00:17 INFO driver.MahoutDriver: Program took 2262 ms (Minutes:
> 0.0377)
> [banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
> --input donut.csv --output donut.model --target color --categories 2
> --predictors x y a b c --types numeric --features 20 --passes 100 --rate 40
> 20
> color ~ 5.882*Intercept Term + 0.445*a + -1.107*b + -20.912*c + 0.445*x +
> 1.855*y
>       Intercept Term 5.88183
>                    a 0.44521
>                    b -1.10685
>                    c -20.91159
>                    x 0.44521
>                    y 1.85450
>     0.000000000     0.000000000     0.000000000     0.000000000
> 0.000000000    -1.106846635     0.000000000     0.000000000
> 0.000000000     0.000000000     0.445207648     0.000000000
> 0.000000000     5.881825108     0.000000000     0.000000000
> 0.000000000     1.854504189     0.000000000   -20.911586416
> 12/09/19 11:00:58 INFO driver.MahoutDriver: Program took 2016 ms (Minutes:
> 0.0336)
> [banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
> --input donut.csv --output donut.model --target color --categories 2
> --predictors x y a b c --types numeric --features 20 --passes 100 --rate 60
> 20
> color ~ 8.320*Intercept Term + 0.705*a + -1.669*b + -29.161*c + 0.705*x +
> 2.723*y
>       Intercept Term 8.31993
>                    a 0.70483
>                    b -1.66860
>                    c -29.16063
>                    x 0.70483
>                    y 2.72289
>     0.000000000     0.000000000     0.000000000     0.000000000
> 0.000000000    -1.668599735     0.000000000     0.000000000
> 0.000000000     0.000000000     0.704831781     0.000000000
> 0.000000000     8.319926323     0.000000000     0.000000000
> 0.000000000     2.722889944     0.000000000   -29.160634416
> 12/09/19 11:01:16 INFO driver.MahoutDriver: Program took 2291 ms (Minutes:
> 0.03818333333333333)
> [banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
> --input donut.csv --output donut.model --target color --categories 2
> --predictors x y a b c --types numeric --features 20 --passes 100 --rate
> 500
> 20
> color ~ 55.909*Intercept Term + 7.925*a + -10.211*b + -197.573*c + 7.925*x
> + 12.743*y
>       Intercept Term 55.90868
>                    a 7.92520
>                    b -10.21115
>                    c -197.57275
>                    x 7.92520
>                    y 12.74325
>     0.000000000     0.000000000     0.000000000     0.000000000
> 0.000000000   -10.211151393     0.000000000     0.000000000
> 0.000000000     0.000000000     7.925202029     0.000000000
> 0.000000000    55.908675853     0.000000000     0.000000000
> 0.000000000    12.743250315     0.000000000  -197.572748252
> 12/09/19 11:14:23 INFO driver.MahoutDriver: Program took 1742 ms (Minutes:
> 0.029033333333333335)
> [banki@cos 1]$ mahout org.apache.mahout.classifier.sgd.TrainLogistic
> --input donut.csv --output donut.model --target color --categories 2
> --predictors x y a b c --types numeric --features 20 --passes 100 --rate
> 50000
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /usr/bin/hadoop and
> HADOOP_CONF_DIR=/usr/lib/hadoop/conf
> MAHOUT-JOB: /opt/mahout/mahout-examples-0.7-job.jar
> 12/09/19 11:17:22 WARN driver.MahoutDriver: No
> org.apache.mahout.classifier.sgd.TrainLogistic.props found on classpath,
> will use command-line arguments only
> 20
> color ~ 5588.511*Intercept Term + 240.624*a + -207.160*b + -19609.709*c +
> 240.624*x + 1858.155*y
>       Intercept Term 5588.51071
>                    a 240.62409
>                    b -207.16022
>                    c -19609.70869
>                    x 240.62409
>                    y 1858.15547
>     0.000000000     0.000000000     0.000000000     0.000000000
> 0.000000000  -207.160221372     0.000000000     0.000000000
> 0.000000000     0.000000000   240.624090101     0.000000000
> 0.000000000  5588.510709572     0.000000000     0.000000000
> 0.000000000  1858.155468135     0.000000000 -19609.708690329
> 12/09/19 11:17:24 INFO driver.MahoutDriver: Program took 2135 ms (Minutes:
> 0.035583333333333335)
> So, the coefficients changes almost by the same multiplier I use for
> various learning rates.
> How can it be so, when the cofficients found by model must povide the
> extremum of the likelihood function?
>
> On the other dataset I use in trying to understand the impact of rate
> parameter I see EXACT multiplication, i. e. when I change the rate
> parameter decreasing it by 10 times, ALL coefficients change exactly by 10
> times decrease. What does it mean? What coefficients can be taken as
> maximizing the likelihood function? Why the algorithm shows no signs
> of "stability of solution"?
>
> I would greatly appreciate your help in any explanation of how cli command
> org.apache.mahout.classifier.sgd.RunLogistic uses the learning rate
> parameter.
>
> Kind regards,
> Nikita Kuznetsov
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message