atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ayush Nigam (JIRA)" <>
Subject [jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas
Date Thu, 20 Sep 2018 05:20:00 GMT


Ayush Nigam commented on ATLAS-2708:

[~toopt4] You have to create your own Lambda code that creates AtlasEntities on the fly as
trigerred by Lambda Function(on changes made to s3 object) and then push to Kafka Queue. This
particular functionality is not part of Atlas tool as of now.

> AWS S3 data lake typedefs for Atlas
> -----------------------------------
>                 Key: ATLAS-2708
>                 URL:
>             Project: Atlas
>          Issue Type: New Feature
>          Components:  atlas-core
>            Reporter: Barbara Eckman
>            Assignee: Barbara Eckman
>            Priority: Critical
>             Fix For: 1.1.0, 2.0.0
>         Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, ATLAS-2708.patch, all_AWS_common_typedefs.json,
all_AWS_common_typedefs_v2.json, all_datalake_typedefs.json, all_datalake_typedefs_v2.json
> Currently the base types in Atlas do not include AWS data lake objects. It would be
nice to add typedefs for AWS data lake objects (buckets and pseudo-directories) and lineage
processes that move the data from another source (e.g., kafka topic) to the data lake.  For
>  * AWSS3PseudoDir type represents the pseudo-directory “prefix” of objects in an
S3 bucket.  For example, in the case of an object with key “myWork/Development/Projects1.xls”, “myWork/Development”
is the pseudo-directory.  It supports:
>  ** Array of avro schemas that are associated with the data in the pseudo-directory (based
on Avro schema extensions outlined in ATLAS-2694)
>  ** what type of data it contains, e.g., avro, json, unstructured
>  ** time of creation
>  * AWSS3BucketLifeCycleRule type represents a rule specifying a transition of the data
in a bucket to a storageClass after a specific time interval, or expiration.  For example,
transition to GLACIER after 60 days, or expire (i.e. be deleted) after 90 days:
>  ** ruleType (e.g., transition or expiration)
>  ** time interval in days before rule is executed  
>  ** storageClass to which the data is transitioned (null if ruleType is expiration)
>  * AWSTag type represents a tag-value pair created by the user and associated with an
AWS object.
>  **  tag
>  ** value
>  * AWSCloudWatchMetric type represents a storage or request metric that is monitored
by AWS CloudWatch and can be configured for a bucket
>  ** metricName, for example, “AllRequests”, “GetRequests”, TotalRequestLatency,
>  ** scope: null if entire bucket; otherwise, the prefixes/tags that filter or limit the
monitoring of the metric.
>  * AWSS3Bucket type represents a bucket in an S3 instance.  It supports:
>  ** Array of AWSS3PseudoDirectories that are associated with objects stored in the bucket 
>  ** AWS region
>  ** IsEncrypted (boolean) 
>  ** encryptionType, e.g., AES-256
>  ** S3AccessPolicy, a JSON object expressing access policies, eg GetObject, PutObject
>  ** time of creation
>  ** Array of AWSS3BucketLifeCycleRules that are associated with the bucket 
>  ** Array of AWSS3CloudWatchMetrics that are associated with the bucket or its tags or
>  ** Array of AWSTags that are associated with the bucket
>  * Generic dataset2Dataset process to represent movement of data from one dataset to
another.  It supports:
>  ** array of transforms performed by the process 
>  ** map of tag/value pairs representing configurationParameters of the process
>  ** inputs and outputs are arrays of dataset objects, e.g., kafka topic and S3 pseudo-directory.

This message was sent by Atlassian JIRA

View raw message