atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rémy SAISSY (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (ATLAS-164) DFS addon for Atlas
Date Fri, 25 Sep 2015 13:48:04 GMT

    [ https://issues.apache.org/jira/browse/ATLAS-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906536#comment-14906536
] 

Rémy SAISSY edited comment on ATLAS-164 at 9/25/15 1:47 PM:
------------------------------------------------------------

Hi [~svenkat],
I have read and thought about the lineage and the dataset stuff and here is how I see it.
I guess this way of modeling HDFS let us keep track of the most important informations for
governance only and also enables a proper lineage of the actions performed on datasets.
Please let me know if it makes sense for you in the context of Atlas.
Thanks.

+ : supertype
+- : type
- : metadata

+ dataset
  +- directory
    - name
    - description
    - path
    - number of files
    - total size
    - list of: owner,group,permission
    - list of files (and symlinks) -- (is it really useful?)
    - list of subdirectories -- (is it really useful?)

  +- file -- (is it really useful?)
    - name
    - description
    - path
    - size
    - list of: owner,group,permission

 + process
  +- create_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us

    +- delete_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us.

    +- append_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us.

    +- create_dir
        - owner
        - group
        - timestamp
        - target directory

    +- delete_dir
        - owner
        - group
        - timestamp
        - target directory



was (Author: rémy):
Hi [~svenkat],
I have read and thought about the lineage and the dataset stuff and here is how I see it.
I guess this way of modeling HDFS let us keep track of the most important informations for
governance only and also enables a proper lineage of the actions performed on datasets.
Please let me know if it makes sense for you in the context of Atlas.
Thanks.

+ : supertype
+- : type
- : metadata

+ dataset
  +- directory
    - name
    - description
    - path
    - number of files
    - total size
    - list of:
        - owner
        - group
        - permission
    - list of files (and symlinks) -- (is it really useful?)
    - list of subdirectories -- (is it really useful?)

  +- file -- (is it really useful?)
    - name
    - description
    - path
    - size
        - list of:
        - owner
        - group
        - permission

 + process
  +- create_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us

    +- delete_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us.

    +- append_file
        - owner
        - group
        - timestamp
        - target directory -- even if it is on a file, we log the parent directory
        - created_file -- only if the file dataset is useful to us.

    +- create_dir
        - owner
        - group
        - timestamp
        - target directory

    +- delete_dir
        - owner
        - group
        - timestamp
        - target directory


> DFS addon for Atlas
> -------------------
>
>                 Key: ATLAS-164
>                 URL: https://issues.apache.org/jira/browse/ATLAS-164
>             Project: Atlas
>          Issue Type: New Feature
>    Affects Versions: 0.6-incubating
>            Reporter: Rémy SAISSY
>            Assignee: Rémy SAISSY
>         Attachments: ATLAS-164.15092015.patch, ATLAS-164.15092015.patch
>
>
> Hi,
> I have wrote an addon for sending DFS metadata into Atlas.
> The patch is attached.
> However, I have a hard time getting the unit tests working properly thus some advices
would be welcome.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message