gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject GSoC Period of Spark Backend Support for Gora (GORA-386)
Date Sun, 10 May 2015 22:48:22 GMT

I've organized GSoC pages at Gora wiki and I've created a template for
reports [1] after I've checked previous reports of Nutch and Gora.

For my GSoC period, I've started with paper at which Spark is introduced
[2] and than finished RDD paper [3]. I've also started to read Spark's
documentation. I'm planning to continue with Dryad's [4] and YARN's papers

After these, my aim is to get a comprehensive knowledge about why Spark is
introduced, its relationship between Hadoop, what are its differences from
other related frameworks and how it is implemented from a view of
architectural perspective.

Next step will be diving into Gora (including picking up any issues to
solve it - by the way you are welcome to suggest any issue!) and
implementing a piece of code that transforms GoraInputFormat to Spark's RDD.

What do you suggest me for next steps (everybody can comment on this, not
just my mentors)?
On the other hand, Lewis and Talat, when do you want me to start weekly
reporting process?

Kind Regards,

PS: I'm blogging during my GSoC process at my personal blog [6].

[1] https://cwiki.apache.org/confluence/display/GORA/Google+Summer+of+Code
[2] http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf
[3] https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
[4] http://research.microsoft.com/pubs/63785/eurosys07.pdf
[5] http://dl.acm.org/citation.cfm?id=2523633
[6] http://furkankamaci.com/

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message