spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: MLLib regression model weights
Date Thu, 18 Sep 2014 19:17:07 GMT
sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ?

For logistic you might want both positive and negative feature...so just
pass it through a filter on abs and then pick top(k)

On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak <sstilak@live.com> wrote:

> Hi All,
>
> I am able to run LinearRegressionWithSGD on a small sample dataset (~60MB
> Libsvm file of sparse data) with 6700 features.
>
> val model = LinearRegressionWithSGD.train(examples, numIterations)
>
> At the end I get a model that
>
> model.weights.size
> res6: Int = 6699
>
> I am assuming each entry in the model is weight for the corresponding
> feature/index.  However,, if I want to get the top10 most important
> features or all features with weights higher than certain threshold, is
> that functionality available out-of-box? I can implement that on my own,
> but seems like a common feature that most of the people will need when they
> are working on high-dimensional dataset.
>
>
>
>

Mime
View raw message