airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-715) HDFS Sensor Should be more effective
Date Sat, 31 Dec 2016 13:02:58 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15789483#comment-15789483
] 

ASF subversion and git services commented on AIRFLOW-715:
---------------------------------------------------------

Commit 1c4cff056488623cfd3a6ec411e680e3e5198b21 in incubator-airflow's branch refs/heads/master
from [~vfoucault]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1c4cff0 ]

[AIRFLOW-715] A more efficient HDFS Sensor:

A more efficient HDFS Sensor:

HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.

With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.

HDFS Sensors is now extended with (contrib):

  - HdfsSensorRegex : for matching files wih a regex
(re)
  - HdfsSensorFolder : for matching with directory

HDFS Sensor has now to built in filters :

  - filter_for_filesize : to filter list result by
the filesize
  - filter_for_ignored_ext : to discard or not
copying files

Unittests added with a new FakeSnakebite client
and a FakeHdfsHook

A more efficient HDFS Sensor:

HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.

With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.

HDFS Sensors is now extended with (contrib):

  - HdfsSensorRegex : for matching files wih a regex
(re)
  - HdfsSensorFolder : for matching with directory

HDFS Sensor has now to built in filters :

  - filter_for_filesize : to filter list result by
the filesize
  - filter_for_ignored_ext : to discard or not
copying files

Unittests added with a new FakeSnakebite client
and a FakeHdfsHook

A more efficient HDFS Sensor:

HDFS Sensor is now capable to trigger true based
on a file size, a directory status
(empty or not) a regex to match files in a
directory and also to discard copying files.

With the base HDFS Sensor, it was not possible to
watch a directory for files with a
unknown name.

HDFS Sensors is now extended with (contrib):

  - HdfsSensorRegex : for matching files wih a regex
(re)
  - HdfsSensorFolder : for matching with directory

HDFS Sensor has now to built in filters :

  - filter_for_filesize : to filter list result by
the filesize
  - filter_for_ignored_ext : to discard or not
copying files

Unittests added with a new FakeSnakebite client
and a FakeHdfsHook

Closes #1957 from vfoucault/feature/AIRFLOW-715


> HDFS Sensor Should be more effective
> ------------------------------------
>
>                 Key: AIRFLOW-715
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-715
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 2.0, Airflow 1.7.1
>         Environment: HDFS Sensor should be more effective
>            Reporter: Vianney FOUCAULT
>            Assignee: Vianney FOUCAULT
>            Priority: Minor
>             Fix For: Airflow 1.8
>
>
> As a Airflow user, HDFS Sensor should be more effective and be aware of file size, matching
regex in files names, be aware of empty directories 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message