atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <>
Subject [jira] [Assigned] (ATLAS-183) Add a Hook in Storm to post the topology metadata
Date Tue, 29 Dec 2015 09:34:49 GMT


Hemanth Yamijala reassigned ATLAS-183:

    Assignee: Hemanth Yamijala

> Add a Hook in Storm to post the topology metadata
> -------------------------------------------------
>                 Key: ATLAS-183
>                 URL:
>             Project: Atlas
>          Issue Type: Sub-task
>    Affects Versions: 0.6-incubating
>            Reporter: Venkatesh Seetharam
>            Assignee: Hemanth Yamijala
>             Fix For: 0.6-incubating
>         Attachments: ATLAS-183.patch
> Apache Storm Integration with Apache Atlas (incubating)
> Introduction
> Apache Storm is a distributed real-time computation system. Storm makes it easy to reliably
process unbounded streams of data, doing for real-time processing what Hadoop did for batch
processing.  The process is essentially a DAG of nodes, which is called topology.
> Apache Atlas is a metadata repository that enables end-to-end data lineage, search and
associate business classification. 
> Overview
> The goal of this integration is to at minimum push the operational topology metadata
along with the underlying data source(s), target(s), derivation processes and any available
business context so Atlas can capture the lineage for this topology.
> It would also help to support custom user annotations per node in the topology.
> There are 2 parts in this process detailed below:
> Data model to represent the concepts in Storm
> Storm Bridge to update metadata in Atlas
> Data Model
> A data model is represented as a Type in Atlas. It contains the descriptions of various
nodes in the DAG, such as spouts and bolts and the corresponding source and target types.
 These need to be expressed as Types in Atlas type system. At the least, we need to create
types for:
> Storm topology containing spouts, bolts, etc. with associations between them
> Source (typically Kafka, etc.)
> Target (typically Hive, HBase, HDFS, etc.)
> You can take a look at the data model code for Hive. Storm should only be simpler than
Hive from a data modeling perspective.
> Pushing Metadata into Atlas
> There are 2 parts to the bridge:
> Storm Bridge 
> This is a one-time import for Storm to list all the active topologies and push the metadata
into Atlas to address cases where Storm deployments exist before Atlas.
> You can refer to the bridge code for Hive.
> Post-execution Hook
> Atlas needs to be notified when a new topology is registered successfully in Storm or
when someone changes the definition of an existing topology.
> You can refer to the hook code for Hive.
> Example use case:
> Custom annotations associated with each node in the topology.  
> For example: Data Quality Rules, Error Handling, etc. A set of annotations that enumerates
rules handling nulls– all nulls for a column get filtered, etc.

This message was sent by Atlassian JIRA

View raw message