Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8D03110854 for ; Tue, 17 Dec 2013 19:32:04 +0000 (UTC) Received: (qmail 74983 invoked by uid 500); 17 Dec 2013 19:32:04 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 74947 invoked by uid 500); 17 Dec 2013 19:32:04 -0000 Mailing-List: contact dev-help@spark.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@spark.incubator.apache.org Delivered-To: mailing list dev@spark.incubator.apache.org Delivered-To: moderator for dev@spark.incubator.apache.org Received: (qmail 74535 invoked by uid 99); 17 Dec 2013 19:31:11 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matthew.c.cheah@gmail.com designates 209.85.223.177 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=+E7lvVmxctz6Lmv9LHkbIGTkSmQxSXMTWrjmKCXE7tY=; b=lLG6RxtIgF9HOB+VTLIyFa7OD6Hjp75wnvoH1qPdYLibcdlTA3BbqJBQK1+WBTPPtC OLmfvyjjE23oe/EP/wVgVvJoSGbs5j4pMNPIz1hX4dfvqG97yGaNTS/KvNxWnGldZ7nF XzrgEgQK0cOrTiNDXB8uofdOHxaYvF/Dym96VFEJTwrVTOdOYDoxdBCs52cW1EytnpYQ Qe3gnj3s4hPVT4GaYo/no2NtbQcZqTL8VKPK4FrWiBsskFzrzrE5LPQFqFjO7PQzaRjx JaSRRhX3f7ZAuDzT4dz7ukdHhvPkNIasPNn/vLLBNkhcZWOo9ZJAnTcIzdjbfFL39TOR rrFw== MIME-Version: 1.0 X-Received: by 10.50.87.4 with SMTP id t4mr4885265igz.18.1387308646075; Tue, 17 Dec 2013 11:30:46 -0800 (PST) Date: Tue, 17 Dec 2013 11:30:45 -0800 Message-ID: Subject: Spark development for undergraduate project From: Matthew Cheah To: dev@spark.incubator.apache.org Content-Type: multipart/alternative; boundary=089e0111bb740b160e04edbff71d X-Virus-Checked: Checked by ClamAV on apache.org --089e0111bb740b160e04edbff71d Content-Type: text/plain; charset=ISO-8859-1 Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark. I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah --089e0111bb740b160e04edbff71d--