airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2420) Add functionality for Azure Data Lake
Date Tue, 15 May 2018 17:32:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476226#comment-16476226
] 

ASF subversion and git services commented on AIRFLOW-2420:
----------------------------------------------------------

Commit 7c233179e91818bd641b283934a73cc84a51ca03 in incubator-airflow's branch refs/heads/master
from [~marcus.rehm@gmail.com]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=7c23317 ]

[AIRFLOW-2420] Azure Data Lake Hook

Add AzureDataLakeHook as a first step to enable
Airflow connect to
Azure Data Lake.

The hook has a simple interface to upload and
download files with all
parameters available in Azure Data Lake sdk and
also a check_for_file
to query if a file exists in data lake.

[AIRFLOW-2420] Add functionality for Azure Data
Lake

Make sure you have checked _all_ steps below.

### JIRA
- [x] My PR addresses the following [Airflow JIRA]
(https://issues.apache.org/jira/browse/AIRFLOW-242
0) issues and references them in the PR title.
    -
https://issues.apache.org/jira/browse/AIRFLOW-2420

### Description
- [x] Here are some details about my PR, including
screenshots of any UI changes:
       This PR creates Azure Data Lake hook
(adl_hook.AdlHook) and all the setup required to
create a new Azure Data Lake connection.

### Tests
- [x] My PR adds the following unit tests __OR__
does not need testing for this extremely good
reason:
       Adds tests to airflow.hooks.adl_hook.py in
tests.hooks.test_adl_hook.py

### Commits
- [x] My commits all reference JIRA issues in
their subject lines, and I have squashed multiple
commits if they address the same issue. In
addition, my commits follow the guidelines from
"[How to write a good git commit
message](http://chris.beams.io/posts/git-
commit/)":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not
"adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

### Documentation
- [x] In case of new functionality, my PR adds
documentation that describes how to use it.
    - When adding new operators/hooks/sensors, the
autoclass documentation generation needs to be
added.

### Code Quality
- [x] Passes `git diff upstream/master -u --
"*.py" | flake8 --diff`

Closes #3333 from marcusrehm/master


> Add functionality for Azure Data Lake
> -------------------------------------
>
>                 Key: AIRFLOW-2420
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2420
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: hooks
>            Reporter: Marcus Rehm
>            Assignee: Marcus Rehm
>            Priority: Major
>             Fix For: 2.0.0
>
>
> Currently Airflow has a hook for Azure Blob Storage but it does not support Azure Data
Lake.
> As a first step a hook would interface with Azure Data Lake via the Python SDK over the
adl protocol.
>  
> The hook would have a simple interface to upload and download files with all parameters
available in ADL sdk and also a check for file to query if a file exists in the data lake.
This last functions will enable sensors development in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message