accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Coetzee <pe...@coetzee.org>
Subject New research using Accumulo: Unified Secure On-/Off-line Analytics
Date Mon, 20 Oct 2014 08:00:51 GMT
New open-access research published in the journal of Parallel Computing
demonstrates a novel approach to engineering analytics for deployment in
streaming and batch contexts.

Increasing numbers of users are extracting real value from their data using
tools like IBM InfoSphere Streams for near-real-time analysis and Apache
Spark across their historical data in Accumulo.

Until now, there hasn't been an approach which permits the use of these
tools from a single shared codebase, with deployment considerations
reserved until deployment time. Furthermore, it has been even harder to
permit this unified analysis while maintaining cell-level traces of the
security heritage for each datum an analytic produces.

Some highlights of the paper include:
  - A domain specific language (CRUCIBLE) and runtime models for on- and
off-line data analytics.
  - Detailed analysis of CRUCIBLE’s runtime performance in state-of-the-art
environments.
  - Development and detailed analysis of a set of runtime models for new
environments.
  - Performance comparison with native implementations and discussion of
optimisation steps.
  - Formulation of a primitive in the DSL that permits an analytic to be
run over multiple data sources.

The paper, Towards Unified Secure On- and Off-line Analytics at Scale, is
available free of charge from Elsevier:

http://www.sciencedirect.com/science/article/pii/S0167819114000842


I am one of the lead authors of the work, and would be more than happy to
discuss any aspects which catch your attention!

Peter

--
Peter Coetzee
Performance Computing and Visualisation PhD Candidate
Department of Computer Science
University of Warwick

Mime
View raw message