spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eyal sharon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11302) Multivariate Gaussian Model with Covariance matrix return zero always
Date Mon, 26 Oct 2015 15:27:27 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974382#comment-14974382
] 

eyal sharon commented on SPARK-11302:
-------------------------------------

Sure , I will try to elaborate more

MU  is the mean vector of my data set.

Here is the basic flow of my code with function I used. Each function runs
over the data set arranged in a matrix


*1- Create a  mu vector *

def createMU(mat: DenseMatrix): Vector = {

  val columnsInArray = toArrays(mat,false)

  Vectors.dense(columnsInArray.map(vector => vector.sum/vector.length ))

}


*2- create a cov matrix *

 def createCovSigma(mat: DenseMatrix,mu: Vector) : DenseMatrix = {


  val rowsInArray = toArrays(mat,true)
  val sigmaSubMU = rowsInArray.map(row => {(row.toList zip
mu.toArray).map(elem=>elem._1-elem._2)}.toArray )

  val checkArray = sigmaSubMU.flatMap(row=>row)

  println("Matrix dimensions -  rows: " + mat.numRows + ",cols: "  +
mat.numCols)
  val mat2 = new DenseMatrix(mat.numRows, mat.numCols,checkArray,true)
  val sigmaTmp: DenseMatrix = mat2.transpose.multiply(mat2)
  val sigmaTmpArray=sigmaTmp.toArray
  val sigmaMatrix: DenseMatrix =  new DenseMatrix(mat.numCols,
mat.numCols, sigmaTmpArray.flatMap(x=>List(x/mat.numRows)),true)

  sigmaMatrix
}

* Note the I am using an auxiliary function toArrays, here is the definition:


def toArrays(mat: Matrix,byRow: Boolean): Array[Array[Double]]  = {

  val direction = if (byRow)  mat.numCols else mat.numRows
  mat.toArray.grouped(direction).toArray

}


*3- After having the mu and the sigma, I can no create an instance of the
gaussian *

val mg = new MultivariateGaussian(mu,sigma)


4- Now, I can create a projection using the PDF

E.g-    d3=mg.pdf(Vectors.dense(629,640,1.7188,618.19))

The model  returns zero for every data point



 4- For validation, I ran a gaussian implantation on Matlab and the results
are:

- For the case of *non covariance* matrix, the two models yield same result
exactly
- For the case of *covariance*, Matlab yields good result but Mlib doesn't.
( note that I feed the two models with the same input, concretely, the same
MU and covariance matrix  )


Best, Eyal





-- 


*This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. Please note that any disclosure, copying or distribution of the 
content of this information is strictly forbidden. If you have received 
this email message in error, please destroy it immediately and notify its 
sender.*


>  Multivariate Gaussian Model with Covariance  matrix return zero always 
> ------------------------------------------------------------------------
>
>                 Key: SPARK-11302
>                 URL: https://issues.apache.org/jira/browse/SPARK-11302
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: eyal sharon
>            Priority: Minor
>
> I have been trying to apply an Anomaly Detection model  using Spark MLib. 
> As an input, I feed the model with a mean vector and a Covariance matrix. ,assuming my
features contain Co-variance.
> Here are my input for the  model ,and the model returns zero for each data point for
this input.
> MU vector - 
> 1054.8, 1069.8, 1.3 ,1040.1
> Cov' matrix - 
> 165496.0 , 167996.0,  11.0 , 163037.0  
> 167996.0,  170631.0,  19.0,  165405.0  
> 11.0,           19.0 ,         0.0,   2.0       
> 163037.0,   165405.0     2.0 ,  160707.0 
> Conversely,  for the  non covariance case, represented by  this matrix ,the model is
working and returns results as expected 
> 165496.0,  0.0 ,           0.0,   0.0                 
> 0.0,           170631.0,   0.0,   0.0                 
> 0.0 ,           0.0 ,           0.8,   0.0                 
> 0.0 ,           0.0,            0.0,  160594.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message