flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Tsikiridis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
Date Tue, 10 Jun 2014 01:04:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025997#comment-14025997

Artem Tsikiridis commented on FLINK-838:

Hello, here is an update for what happened the third week of GSoC and some short-term plans.

*tl;dr version*

Worked on the jobclient for Hadoop on Flink. The interface is mostly transparent to a Hadoop
user, it's almost ready for local execution. Not all issues have been addressed, though. Some
hacks, especially to go more low level (nephele) which probably should be reconsidered. Real
Reporter. Figured out the best way to map Accumulators to Counters. Tests need to be written
this week.

*Slightly longer version*

I'm not sure if there's something to analyze extensively this week. The JobClient for Hadoop
to Flink is almost ready for local execution. In some parts, I need to access the nepheleminicluster
and I'm not particularly happy with the way it is done, even though it works.

InputFormats for the driver:
As [~twalthr] said above once getParameter() is possible for interfaces it will much cleaner.
Until then, I'm doing what is described above (access the InputSplit etc.).

*Next week*

- Hopefully be done with LocalExecution. Or better, not make a distinction at all. Still need
to figure out if this is possible.
- Accumulators --> Counters: Just make sure it's mapped everywhere.
- Parallelism --> I've tried several approaches (different execution environments, dop
etc.). Figure out what's more generic and best.
- Naming change in my branches! Get used to Flink. 
- Tests, tests, tests: Very important in order to make the hadoop driver ready for PR. Currently
it is not.


I will submit the basic tasks branch for PR once FLINK-777 is merged. There are tests, so
I think it's ready. Hopefully after writing tests for the driver this week, the hadoop driver
branch goes into the same direction.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
> -------------------------------------------------------------------------------
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
> This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop
Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration
to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we
already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop
examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line interface) for the
wrapper. Implement how will the user run them. (1 - 2 weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed
Cache etc.) There are quite a few interfaces and it will be a challenge to support all of
them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open

This message was sent by Atlassian JIRA

View raw message