spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamer TAS <>
Subject Apache Spark GSOC 2015
Date Wed, 11 Mar 2015 20:22:39 GMT
Hello Everyone,

I'm a senior year computer engineering student in Turkey.
My main area of interests are cloud computing and machine learning.

I've been working on Apache Spark using Scala API for a few months. My projects involved the
use of MLib for a movie recommendation system and a stock prediction model. I would be interested
in working on Spark for GSOC 2015. From my experience there a few enhancements that can be
 - Learning models can be standardized in a hierarchical manner to increase code quality and
make future algorithm implementations easier. For example, even though it's in graphx library,
SVD++ didn't have any model implementations. Currently it only returns the pieces of the calculation.
The documentation wasn't clear either (apart from the link to the SVD++ paper). 
 - New algorithms might be implemented to such as restricted Boltzmann machines, tensor models
and tensor factorization for recommendation sub-library, svm multi-class classification.
 - Testing documentation was close to none(only a blog post link). Each test creates a new
spark context. Work-arounds were necessary to increase testing productivity(e.g. pass,fail,refactor
cycle was taking a long time).
But, don't get the idea that I dislike Spark for not having those features. I loved working
with Spark and I'd be happy to work on improving it. Mainly the model hierarchy and new machine
learning algorithms for Spark MLib and GraphX if there is anyone who would be interested in
mentoring. I'll work on a proposal to give more details about algorithms, a timeline. I just
wanted to give a heads-up before doing so.
If you have any questions please feel free to ask.
Thanks in advance.

Tamer Tas
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message