flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Tsikiridis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
Date Mon, 16 Jun 2014 07:07:02 GMT

    [ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032168#comment-14032168

Artem Tsikiridis commented on FLINK-838:


here is a report of the fourth week .


Worked more on the runtime environment of a hadoop job. (see point 2). Added support for custom
partitioning and intermediate sorting (comparator, groupcomparator).
Prepared an environment for distributed testing.


we are reaching the midterm evaluation of the program in 2 weeks time. As Robert suggested
above it would be nice to merge the first version
of the abstraction layer. That would be the support for the following hadoop mapred interfaces:
Mapper, Reducer, Combiner, A basic driver
(justing parsing the conf and starting a job), and the comparator-partitioner interfaces which
I worked on this week.

I am currently trying to improve test coverage for this branch and will try it on the cluster
today. So in a few days (mid of the week)
it will be virtually be ready to be code-reviewed to be merged. More, I would be happy to
assist  with testing 777 if iit is


Then, as soon as 1 is being code-reviewed I will in parallel work on the advanced features
of a Hadoop driver

Where I have some issues mostly because I need to access information from Flink's Nephele
Cluster which is abstracted away to have a working RunningJob for Hadoop's JobClient. You
see, I repeat myself a lot in the environment code. Is it possible to refactor the environments
(e.g. break submitJobandWait to submitJob and wait - generally have a wait ). This is the
nature of the changes. However, I believe this discussion can be done after the midterm where
a first version of the project is already merged.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
> -------------------------------------------------------------------------------
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
> This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop
Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration
to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we
already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop
examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line interface) for the
wrapper. Implement how will the user run them. (1 - 2 weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed
Cache etc.) There are quite a few interfaces and it will be a challenge to support all of
them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open

This message was sent by Atlassian JIRA

View raw message