flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Holzemer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-668) API Proposal - NamedDataSets
Date Wed, 18 Jun 2014 16:44:25 GMT

    [ https://issues.apache.org/jira/browse/FLINK-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035919#comment-14035919
] 

Markus Holzemer commented on FLINK-668:
---------------------------------------

The discussion on this topic is continued in a newer issue. (FLINK-947)

> API Proposal - NamedDataSets
> ----------------------------
>
>                 Key: FLINK-668
>                 URL: https://issues.apache.org/jira/browse/FLINK-668
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> @StephanEwen, @aljoscha and me were discussing a further stage / alternative version
of the new Java API that we called NamedDataSets. Instead of dealing with specific types that
are checked on compile time, users should be able to just use names of fields to operate on.
The types would be checked not on compile time but on pre flight time. That would give a feeling
more similiar to what SQL is like.
> Currently users often have to remember what position in the tuple a specific field has,
which can get a little bit annoying when dealing with bigger queries. Using names instead
would perhaps make this more manageable.
> I have created a first proposal for the syntax that we can use as a basis for disussion:
> ```
> NamedDataSet nds = get3TupleDataSet(env).named("ID", "Number", "Comment");
> 		
> NamedDataSet join = get3TupleDataSet(env).named("ID", "Number", "Comment");
> 		
> NamedDataSet join_result = nds.join(join).where("ID").equalTo("ID");
> 		
> NamedDataSet group_result = nds.groupBy("ID");
> // to apply a udf
> NamedDataSet reduceDs = nds.get("ID", "Number", "Comment").types(Integer.class, Long.class,
String.class)
> 				.groupBy(1).reduce(new Tuple3Reduce("B-)")).named("ID", "Number", "Comment");
> 		
> reduceDs.get("ID", "Number", "Comment").types(Integer.class, Long.class, String.class).print();
> env.execute();
> ```
> My current development progress can be looked at here:
> https://github.com/markus-h/stratosphere/compare/named_dataset
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/668
> Created by: [markus-h|https://github.com/markus-h]
> Labels: enhancement, java api, user satisfaction, 
> Created at: Tue Apr 08 13:31:59 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message