Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52AA811C16 for ; Mon, 16 Jun 2014 07:11:27 +0000 (UTC) Received: (qmail 81533 invoked by uid 500); 16 Jun 2014 07:11:27 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 81476 invoked by uid 500); 16 Jun 2014 07:11:27 -0000 Mailing-List: contact dev-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.incubator.apache.org Delivered-To: mailing list dev@flink.incubator.apache.org Received: (qmail 81465 invoked by uid 99); 16 Jun 2014 07:11:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jun 2014 07:11:27 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 16 Jun 2014 07:11:27 +0000 Received: (qmail 80908 invoked by uid 99); 16 Jun 2014 07:11:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jun 2014 07:11:02 +0000 Date: Mon, 16 Jun 2014 07:11:02 +0000 (UTC) From: "Artem Tsikiridis (JIRA)" To: dev@flink.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032168#comment-14032168 ] Artem Tsikiridis edited comment on FLINK-838 at 6/16/14 7:10 AM: ----------------------------------------------------------------- Hello, here is a report of the fourth week . short Worked more on the runtime environment of a hadoop job. (see point 2). Added support for custom partitioning and intermediate sorting (comparator, groupcomparator). Prepared an environment for distributed testing. 1) we are reaching the midterm evaluation of the program in 2 weeks time. As Robert suggested above it would be nice to merge the first version of the abstraction layer. That would be the support for the following hadoop mapred interfaces: Mapper, Reducer, Combiner, A basic driver (just parsing the conf and starting a job), and the comparator-partitioner interfaces which I worked on this week. Basically, all programming interfaces from start to finish of hadoop. I am currently trying to improve test coverage for this branch and will try it on the cluster today. So in a few days (mid of the week) it will be virtually be ready to be code-reviewed to be merged. More, I would be happy to assist with testing 777 if iit is needed. 2) Then, as soon as 1 is being code-reviewed I will in parallel work on the advanced features of a Hadoop driver where I have some issues mostly because I need to access information from Flink's Nephele Cluster which is abstracted away to have a working RunningJob for Hadoop's JobClient. You see, I repeat myself a lot in the environment code. I was wondering if it is possible to refactor the environments (e.g. break submitJobandWait to submitJob and wait - generally have a wait ). This is the nature of the changes. However, I believe this discussion can be done after the midterm where a first version of the project is already merged. So in the next two weeks, I focus on 1) exclusively. was (Author: atsikiridis): Hello, here is a report of the fourth week . short Worked more on the runtime environment of a hadoop job. (see point 2). Added support for custom partitioning and intermediate sorting (comparator, groupcomparator). Prepared an environment for distributed testing. 1) we are reaching the midterm evaluation of the program in 2 weeks time. As Robert suggested above it would be nice to merge the first version of the abstraction layer. That would be the support for the following hadoop mapred interfaces: Mapper, Reducer, Combiner, A basic driver (just parsing the conf and starting a job), and the comparator-partitioner interfaces which I worked on this week. I am currently trying to improve test coverage for this branch and will try it on the cluster today. So in a few days (mid of the week) it will be virtually be ready to be code-reviewed to be merged. More, I would be happy to assist with testing 777 if iit is needed. 2) Then, as soon as 1 is being code-reviewed I will in parallel work on the advanced features of a Hadoop driver where I have some issues mostly because I need to access information from Flink's Nephele Cluster which is abstracted away to have a working RunningJob for Hadoop's JobClient. You see, I repeat myself a lot in the environment code. I was wondering if it is possible to refactor the environments (e.g. break submitJobandWait to submitJob and wait - generally have a wait ). This is the nature of the changes. However, I believe this discussion can be done after the midterm where a first version of the project is already merged. So in the next two weeks, I focus on 1) exclusively. > GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere > ------------------------------------------------------------------------------- > > Key: FLINK-838 > URL: https://issues.apache.org/jira/browse/FLINK-838 > Project: Flink > Issue Type: Improvement > Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop Compatibliltiy Layer for Stratosphere. > Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes) > As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis > Most importantly, there is the following **schedule**: > *19 May - 27 June (Midterm)* > 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks) > 2) Understand how the running of these jobs works (e.g. command line interface) for the wrapper. Implement how will the user run them. (1 - 2 weeks). > *27 June - 11 August* > 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed Cache etc.) There are quite a few interfaces and it will be a challenge to support all of them. (5 full weeks) > 2) Profiling of the application and optimizations (if applicable) > *11 August - 18 August* > Write documentation on code, write a README with care and add more unit-tests. (1 week) > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/838 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: core, enhancement, parent-for-major-feature, > Milestone: Release 0.7 (unplanned) > Created at: Tue May 20 10:11:34 CEST 2014 > State: open -- This message was sent by Atlassian JIRA (v6.2#6252)