From dev-return-27089-archive-asf-public=cust-asf.ponee.io@atlas.apache.org Tue Sep 25 02:03:03 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6CD41180649 for ; Tue, 25 Sep 2018 02:03:03 +0200 (CEST) Received: (qmail 33535 invoked by uid 500); 25 Sep 2018 00:03:02 -0000 Mailing-List: contact dev-help@atlas.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@atlas.apache.org Delivered-To: mailing list dev@atlas.apache.org Received: (qmail 33524 invoked by uid 99); 25 Sep 2018 00:03:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2018 00:03:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 10537C972F for ; Tue, 25 Sep 2018 00:03:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.301 X-Spam-Level: X-Spam-Status: No, score=-110.301 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 2GTLc43Qefzy for ; Tue, 25 Sep 2018 00:03:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1F25A5F41C for ; Tue, 25 Sep 2018 00:03:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8D4A9E00A5 for ; Tue, 25 Sep 2018 00:03:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3CE8823F9E for ; Tue, 25 Sep 2018 00:03:00 +0000 (UTC) Date: Tue, 25 Sep 2018 00:03:00 +0000 (UTC) From: "t oo (JIRA)" To: dev@atlas.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ATLAS-2708) AWS S3 data lake typedefs for Atlas MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ATLAS-2708?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1662= 6588#comment-16626588 ]=20 t oo commented on ATLAS-2708: ----------------------------- tracking further work in=C2=A0ATLAS-2889 > AWS S3 data lake typedefs for Atlas > ----------------------------------- > > Key: ATLAS-2708 > URL: https://issues.apache.org/jira/browse/ATLAS-2708 > Project: Atlas > Issue Type: New Feature > Components: atlas-core > Reporter: Barbara Eckman > Assignee: Barbara Eckman > Priority: Critical > Fix For: 1.1.0, 2.0.0 > > Attachments: 3010-aws_model.json, ATLAS-2708-2.patch, ATLAS-2708.= patch, all_AWS_common_typedefs.json, all_AWS_common_typedefs_v2.json, all_d= atalake_typedefs.json, all_datalake_typedefs_v2.json > > > Currently the base=C2=A0types in Atlas do not include AWS data lake objec= ts. It would be nice to add typedefs for AWS data lake objects (buckets and= pseudo-directories) and lineage processes that move the data from another = source (e.g., kafka topic) to the data lake. =C2=A0For example: > * AWSS3PseudoDir type represents=C2=A0the pseudo-directory=C2=A0=E2=80= =9Cprefix=E2=80=9D of objects in an S3 bucket. =C2=A0For example, in the ca= se of an object with key=C2=A0=E2=80=9CmyWork/Development/Projects1.xls=E2= =80=9D,=C2=A0=E2=80=9CmyWork/Development=E2=80=9D is the pseudo-directory. = =C2=A0It supports: > ** Array of avro schemas that are associated with the data in the pseudo= -directory (based on Avro schema extensions outlined in=C2=A0ATLAS-2694) > ** what type of data it contains, e.g., avro, json, unstructured > ** time of creation > * AWSS3BucketLifeCycleRule type represents a rule specifying a transitio= n of the data in a bucket to a storageClass after a specific time interval,= or expiration. =C2=A0For example, transition to GLACIER after 60 days, or = expire (i.e. be deleted) after 90 days: > ** ruleType (e.g., transition or expiration) > ** time interval in days before rule is executed =C2=A0 > ** storageClass to which the data is transitioned (null if ruleType is e= xpiration) > * AWSTag type represents a tag-value pair created by the user and associ= ated with an AWS object. > ** =C2=A0tag > ** value > * AWSCloudWatchMetric type represents a storage or request metric that i= s monitored by AWS CloudWatch and can be configured for a bucket > ** metricName, for example, =E2=80=9CAllRequests=E2=80=9D, =E2=80=9CGetR= equests=E2=80=9D, TotalRequestLatency, BucketSizeBytes > ** scope: null if entire bucket; otherwise, the prefixes/tags that filte= r or limit the monitoring of the metric. > * AWSS3Bucket type represents a bucket in an S3 instance. =C2=A0It suppo= rts: > ** Array of AWSS3PseudoDirectories=C2=A0that are associated with objects= stored in the bucket=C2=A0 > ** AWS region > ** IsEncrypted (boolean)=C2=A0 > ** encryptionType, e.g.,=C2=A0AES-256 > ** S3AccessPolicy, a JSON object expressing access policies, eg GetObjec= t, PutObject > ** time of creation > ** Array of AWSS3BucketLifeCycleRules that are associated with the bucke= t=C2=A0 > ** Array of AWSS3CloudWatchMetrics that are associated with the bucket o= r its tags or prefixes > ** Array of AWSTags that are associated with the bucket > * Generic dataset2Dataset process to represent movement of data from one= dataset to another. =C2=A0It supports: > ** array of transforms performed by the process=C2=A0 > ** map of tag/value pairs representing configurationParameters of the pr= ocess > ** inputs and outputs are arrays of dataset objects, e.g., kafka topic a= nd S3=C2=A0pseudo-directory. > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)