airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour of hdfs3
Date Wed, 15 Aug 2018 05:23:10 GMT
XD-DENG commented on a change in pull request #3560: [AIRFLOW-2697] Drop snakebite in favour
of hdfs3
URL: https://github.com/apache/incubator-airflow/pull/3560#discussion_r210173489
 
 

 ##########
 File path: airflow/sensors/hdfs_sensor.py
 ##########
 @@ -17,103 +17,231 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import re
-import sys
-from builtins import str
+import posixpath
 
 from airflow import settings
-from airflow.hooks.hdfs_hook import HDFSHook
+from airflow.hooks.hdfs_hook import HdfsHook
 from airflow.sensors.base_sensor_operator import BaseSensorOperator
 from airflow.utils.decorators import apply_defaults
-from airflow.utils.log.logging_mixin import LoggingMixin
 
 
-class HdfsSensor(BaseSensorOperator):
-    """
-    Waits for a file or folder to land in HDFS
+class HdfsFileSensor(BaseSensorOperator):
+    """Sensor that waits for files matching a specific (glob) pattern to land in HDFS.
+
+    :param str file_pattern: Glob pattern to match.
+    :param str conn_id: Connection to use.
+    :param Iterable[FilePathFilter] filters: Optional list of filters that can be
+        used to apply further filtering to any file paths matching the glob pattern.
+        Any files that fail a filter are dropped from consideration.
+    :param int min_size: Minimum size (in MB) for files to be considered. Can be used
+        to filter any intermediate files that are below the expected file size.
+    :param Set[str] ignore_exts: File extensions to ignore. By default, files with
+        a '_COPYING_' extension are ignored, as these represent temporary files.
 
 Review comment:
   Hi @jrderuiter @Fokko , I think it would be good to explicitly tell users that how `ignore_exts`
should be like. For example, both `{'.py', '.exe'}` and `{'py', 'exe'}` seem valid, but only
`{'py', 'exe'}` would work here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message