spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: mllib vector templates
Date Thu, 08 May 2014 05:25:58 GMT
Hi,

I see ALS is still using Array[Int] but for other mllib algorithm we moved
to Vector[Double] so that it can support either dense and sparse formats...

ALS can stay in Array[Int] due to the Netflix format for input datasets
which is well defined but it helps if we move ALS to Vector[Double] as
well...that way all algorithms will be consistent...

The second issue is that toString on SparseVector does not write libsvm
format but something not very generic...can we change the
SparseVector.toString to write as libsvm output ? I am dumping a sample of
dataset to see how mllib glm compares with the glmnet-R package for QoR...

Thanks.
Deb

On Mon, May 5, 2014 at 4:05 PM, David Hall <dlwh@cs.berkeley.edu> wrote:
>
>> On Mon, May 5, 2014 at 3:40 PM, DB Tsai <dbtsai@stanford.edu> wrote:
>>
>> > David,
>> >
>> > Could we use Int, Long, Float as the data feature spaces, and Double for
>> > optimizer?
>> >
>>
>> Yes. Breeze doesn't allow operations on mixed types, so you'd need to
>> convert the double vectors to Floats if you wanted, e.g. dot product with
>> the weights vector.
>>
>> You might also be interested in FeatureVector, which is just a wrapper
>> around Array[Int] that emulates an indicator vector. It supports dot
>> products, axpy, etc.
>>
>> -- David
>>
>>
>> >
>> >
>> > Sincerely,
>> >
>> > DB Tsai
>> > -------------------------------------------------------
>> > My Blog: https://www.dbtsai.com
>> > LinkedIn: https://www.linkedin.com/in/dbtsai
>> >
>> >
>> > On Mon, May 5, 2014 at 3:06 PM, David Hall <dlwh@cs.berkeley.edu>
>> wrote:
>> >
>> > > Lbfgs and other optimizers would not work immediately, as they require
>> > > vector spaces over double. Otherwise it should work.
>> > > On May 5, 2014 3:03 PM, "DB Tsai" <dbtsai@stanford.edu> wrote:
>> > >
>> > > > Breeze could take any type (Int, Long, Double, and Float) in the
>> matrix
>> > > > template.
>> > > >
>> > > >
>> > > > Sincerely,
>> > > >
>> > > > DB Tsai
>> > > > -------------------------------------------------------
>> > > > My Blog: https://www.dbtsai.com
>> > > > LinkedIn: https://www.linkedin.com/in/dbtsai
>> > > >
>> > > >
>> > > > On Mon, May 5, 2014 at 2:56 PM, Debasish Das <
>> debasish.das83@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Is this a breeze issue or breeze can take templates on float
/
>> > double ?
>> > > > >
>> > > > > If breeze can take templates then it is a minor fix for
>> Vectors.scala
>> > > > right
>> > > > > ?
>> > > > >
>> > > > > Thanks.
>> > > > > Deb
>> > > > >
>> > > > >
>> > > > > On Mon, May 5, 2014 at 2:45 PM, DB Tsai <dbtsai@stanford.edu>
>> wrote:
>> > > > >
>> > > > > > +1  Would be nice that we can use different type in Vector.
>> > > > > >
>> > > > > >
>> > > > > > Sincerely,
>> > > > > >
>> > > > > > DB Tsai
>> > > > > > -------------------------------------------------------
>> > > > > > My Blog: https://www.dbtsai.com
>> > > > > > LinkedIn: https://www.linkedin.com/in/dbtsai
>> > > > > >
>> > > > > >
>> > > > > > On Mon, May 5, 2014 at 2:41 PM, Debasish Das <
>> > > debasish.das83@gmail.com
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Hi,
>> > > > > > >
>> > > > > > > Why mllib vector is using double as default ?
>> > > > > > >
>> > > > > > > /**
>> > > > > > >
>> > > > > > >  * Represents a numeric vector, whose index type is
Int and
>> value
>> > > > type
>> > > > > is
>> > > > > > > Double.
>> > > > > > >
>> > > > > > >  */
>> > > > > > >
>> > > > > > > trait Vector extends Serializable {
>> > > > > > >
>> > > > > > >
>> > > > > > >   /**
>> > > > > > >
>> > > > > > >    * Size of the vector.
>> > > > > > >
>> > > > > > >    */
>> > > > > > >
>> > > > > > >   def size: Int
>> > > > > > >
>> > > > > > >
>> > > > > > >   /**
>> > > > > > >
>> > > > > > >    * Converts the instance to a double array.
>> > > > > > >
>> > > > > > >    */
>> > > > > > >
>> > > > > > >   def toArray: Array[Double]
>> > > > > > >
>> > > > > > > Don't we need a template on float/double ? This will
give us
>> > memory
>> > > > > > > savings...
>> > > > > > >
>> > > > > > > Thanks.
>> > > > > > >
>> > > > > > > Deb
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message