flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Tsikiridis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
Date Wed, 06 Aug 2014 09:15:13 GMT

    [ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087451#comment-14087451
] 

Artem Tsikiridis commented on FLINK-838:
----------------------------------------

Yes, I am talking about comparators to sort keys before reducing.

I do it by adding some functionality to {{WritableComparator}} of Flink. ( https://github.com/atsikiridis/incubator-flink/blob/sorting-hadoop/stratosphere-java/src/main/java/eu/stratosphere/api/java/typeutils/runtime/WritableComparator.java
)

So far this comparator sorts in ascending order by default. Now it accepts a hadoop {{Comparator}}
which it wraps.

And also I extended the api of {{WritableTypeInformation}} so when specifying the return types
of {{HadoopMapFunction}}  one can set this custom hadoop comparator. I have tried some test
cases (descending keys comparator, comparator by first letter etc.

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop
Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration
to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we
already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop
examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line interface) for the
wrapper. Implement how will the user run them. (1 - 2 weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed
Cache etc.) There are quite a few interfaces and it will be a challenge to support all of
them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message