Flink Forward 2015 is the first
conference with Flink at its center that aims to bring together the
Apache Flink community in a single place. The organizers are starting
this conference in October 12 and 13 from Berlin, the place where
@@ -310,7 +310,7 @@ community last month.
We are happy to announce a new major Stratosphere release, version 0.5. This release adds many new features and improves the interoperability, stability, and performance of the system. The major theme of the release is the completely new Java API that makes it easy to write powerful distributed programs.
We are happy to announce that Stratosphere has been accepted as a project for the Apache Incubator. The proposal has been accepted by the Incubator PMC members earlier this week. The Apache Incubator is the first step in the process of giving a project to the Apache Software Foundation. While under incubation, the project will move to the Apache infrastructure and adopt the community-driven development principles of the Apache Foundation. Projects can graduate from incubation to become top-level projects if they show activity, a healthy community dynamic, and releases.
We recently merged a pull request that allows you to use any existing Hadoop InputFormat with Stratosphere. So you can now (in the 0.5-SNAPSHOT and upwards versions) define a Hadoop-based data source:
Stratosphere’s hybrid approach combines MapReduce and MPP database techniques. One central part of this approach is to have a separation between the programming (API) and the way programs are executed(execution plans). The compiler/optimizer decides the details concerning caching or when to partition/broadcast with a holistic view of the program. The same program may actually be executed differently in different scenarios (input data of different sizes, different number of machines).
The Stratosphere team is proud to announce that it is going to present at the Hadoop Summit 2014 in Amsterdam on April 2-3. Our talk “Big Data looks tiny from Stratosphere” is part of the “Future of Hadoop” Track. The talk abstract already made it into the top 5 in the Community Vote that took place by the end of last year.
Stratosphere won the second place in
- the competition
- organized by Humboldt Innovation on "Big Data: Research meets
- Startups," where several research projects were evaluated by a
- panel of experts from the Berlin startup ecosystem. The award
- includes a monetary prize of 10,000 euros.
-
Our paper ““All Roads Lead to Rome:” Optimistic Recovery for Distributed
-Iterative Data Processing” authored by Sebastian Schelter, Kostas
-Tzoumas, Stephan Ewen and Volker Markl has been accepted accepted at the
-ACM International Conference on Information and Knowledge Management
-(CIKM 2013) in San Francisco.
Our demo submission
-"Large-Scale Social-Media Analytics on Stratosphere"
-by Christoph Boden, Marcel Karnstedt, Miriam Fernandez and Volker Markl
-has been accepted for WWW 2013 in Rio de Janeiro, Brazil.
-
Visit our demo, and talk to us if you are attending WWW 2013.
-
Abstract:
-The importance of social-media platforms and online communities - in business as well as public context - is more and more acknowledged and appreciated by industry and researchers alike. Consequently, a wide range of analytics has been proposed to understand, steer, and exploit the mechanics and laws driving their functionality and creating the resulting benefits. However, analysts usually face significant problems in scaling existing and novel approaches to match the data volume and size of modern online communities. In this work, we propose and demonstrate the usage of the massively parallel data prossesing system Stratosphere, based on second order functions as an extended notion of the MapReduce paradigm, to provide a new level of scalability to such social-media analytics. Based on the popular example of role analysis, we present and illustrate how this massively parallel approach can be leveraged to scale out complex data-mining tasks, while providing a programming approach th
at eases the formulation of complete analytical workflows.
This is a preview of our demo that will be presented at ICDE 2013 in Brisbane.
-The demo shows how static code analysis can be leveraged to reordered UDF operators in data flow programs.
-
Detailed information can be found in our papers which are available on the publication page.
Our demo submission
-"Applying Stratosphere for Big Data Analytics"
-has been accepted for BTW 2013 in Magdeburg, Germany.
-The demo focuses on Stratosphere's query language Meteor, which has been presented in our paper "Meteor/Sopremo: An Extensible Query Language and Operator Model" [pdf] at the BigData workshop associated with VLDB 2012 in Istanbul.
-
Visit our demo, and talk to us if you are going to attend BTW 2013.
-
Abstract:
-Analyzing big data sets as they occur in modern business and science applications requires query languages that allow for the specification of complex data processing tasks. Moreover, these ideally declarative query specifications have to be optimized, parallelized and scheduled for processing on massively parallel data processing platforms. This paper demonstrates the application of Stratosphere to different kinds of Big Data Analytics tasks. Using examples from different application domains, we show how to formulate analytical tasks as Meteor queries and execute them with Stratosphere. These examples include data cleansing and information extraction tasks, and a correlation analysis of microblogging and stock trade volume data that we describe in detail in this paper.
Our demo submission
-"Peeking into the Optimization of Data Flow Programs with MapReduce-style UDFs"
-has been accepted for ICDE 2013 in Brisbane, Australia.
-The demo illustrates the contributions of our VLDB 2012 paper "Opening the Black Boxes in Data Flow Optimization" [PDF] and [Poster PDF].
-
Visit our poster, enjoy the demo, and talk to us if you are going to attend ICDE 2013.
-
Abstract:
-Data flows are a popular abstraction to define data-intensive processing tasks. In order to support a wide range of use cases, many data processing systems feature MapReduce-style user-defined functions (UDFs). In contrast to UDFs as known from relational DBMS, MapReduce-style UDFs have less strict templates. These templates do not alone provide all the information needed to decide whether they can be reordered with relational operators and other UDFs. However, it is well-known that reordering operators such as filters, joins, and aggregations can yield runtime improvements by orders of magnitude.
-We demonstrate an optimizer for data flows that is able to reorder operators with MapReduce-style UDFs written in an imperative language. Our approach leverages static code analysis to extract information from UDFs which is used to reason about the reorderbility of UDF operators. This information is sufficient to enumerate a large fraction of the search space covered by conventional RDBMS optimizers including filter and aggregation push-down, bushy join orders, and choice of physical execution strategies based on interesting properties.
-We demonstrate our optimizer and a job submission client that allows users to peek step-by-step into each phase of the optimization process: the static code analysis of UDFs, the enumeration of reordered candidate data flows, the generation of physical execution plans, and their parallel execution. For the demonstration, we provide a selection of relational and non-relational data flow programs which highlight the salient features of our approach.
We are happy to announce that version 0.2 of the Stratosphere System has been released. It has a lot of performance improvements as well as a bunch of exciting new features like:
-
-
The new Sopremo Algebra Layer and the Meteor Scripting Language
-
The whole new tuple data model for the PACT API
-
Fault tolerance through local checkpoints
-
A ton of performance improvements on all layers
-
Support for plug-ins on the data flow channel layer
-
Many new library classes (for example new Input-/Output-Formats)
-
-
For a complete list of new features, check out the change log.
We are happy to announce that version 0.2 of the Stratosphere System has been released. It has a lot of performance improvements as well as a bunch of exciting new features like:
-
-
The new Sopremo Algebra Layer and the Meteor Scripting Language
-
The whole new tuple data model for the PACT API
-
Fault tolerance through local checkpoints
-
A ton of performance improvements on all layers
-
Support for plug-ins on the data flow channel layer
-
Many new library classes (for example new Input-/Output-Formats)
-
-
For a complete list of new features, check out the change log.