spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From danqing0703 <>
Subject Problems concerning implementing machine learning algorithm from scratch based on Spark
Date Tue, 30 Dec 2014 06:12:47 GMT
Hi all,

I am trying to use some machine learning algorithms that are not included
in the Mllib. Like Mixture Model and LDA(Latent Dirichlet Allocation), and
I am using pyspark and Spark SQL.

My problem is: I have some scripts that implement these algorithms, but I
am not sure which part I shall change to make it fit into Big Data.

   - Like some very simple calculation may take much time if data is too
   big,but also constructing RDD or SQLContext table takes too much time. I am
   really not sure if I shall use map(), reduce() every time I need to make
   - Also, there are some matrix/array level calculation that can not be
   implemented easily merely using map(),reduce(), thus functions of the Numpy
   package shall be used. I am not sure when data is too big, and we simply
   use the numpy functions. Will it take too much time?

I have found some scripts that are not from Mllib and was created by other
developers(credits to Meethu Mathew from Flytxt, thanks for giving me

Many thanks and look forward to getting feedbacks!

Best, Danqing (7K) <>

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message