airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2981) TypeError in dataflow operators when using GCS jar or py_file
Date Sat, 01 Sep 2018 10:34:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599603#comment-16599603
] 

ASF GitHub Bot commented on AIRFLOW-2981:
-----------------------------------------

kaxil closed pull request #3831: [AIRFLOW-2981] Fix TypeError in dataflow operators
URL: https://github.com/apache/incubator-airflow/pull/3831
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/airflow/contrib/operators/dataflow_operator.py b/airflow/contrib/operators/dataflow_operator.py
index 3f6093b3ba..980b5792e7 100644
--- a/airflow/contrib/operators/dataflow_operator.py
+++ b/airflow/contrib/operators/dataflow_operator.py
@@ -16,7 +16,7 @@
 # KIND, either express or implied.  See the License for the
 # specific language governing permissions and limitations
 # under the License.
-
+import os
 import re
 import uuid
 import copy
@@ -358,7 +358,7 @@ def google_cloud_to_local(self, file_name):
         # Extracts bucket_id and object_id by first removing 'gs://' prefix and
         # then split the remaining by path delimiter '/'.
         path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
-        if path_components < 2:
+        if len(path_components) < 2:
             raise Exception(
                 'Invalid Google Cloud Storage (GCS) object path: {}.'
                 .format(file_name))
@@ -369,7 +369,7 @@ def google_cloud_to_local(self, file_name):
                                                  path_components[-1])
         file_size = self._gcs_hook.download(bucket_id, object_id, local_file)
 
-        if file_size > 0:
+        if os.stat(file_size).st_size > 0:
             return local_file
         raise Exception(
             'Failed to download Google Cloud Storage GCS object: {}'
diff --git a/tests/contrib/operators/test_dataflow_operator.py b/tests/contrib/operators/test_dataflow_operator.py
index 4ea5f65698..a373126b24 100644
--- a/tests/contrib/operators/test_dataflow_operator.py
+++ b/tests/contrib/operators/test_dataflow_operator.py
@@ -20,9 +20,10 @@
 
 import unittest
 
-from airflow.contrib.operators.dataflow_operator import DataFlowPythonOperator, \
-    DataFlowJavaOperator, DataflowTemplateOperator
-from airflow.contrib.operators.dataflow_operator import DataFlowPythonOperator
+from airflow.contrib.operators.dataflow_operator import \
+    DataFlowPythonOperator, DataFlowJavaOperator, \
+    DataflowTemplateOperator, GoogleCloudBucketHelper
+
 from airflow.version import version
 
 try:
@@ -186,3 +187,25 @@ def test_exec(self, dataflow_mock):
         }
         start_template_hook.assert_called_once_with(TASK_ID, expected_options,
                                                     PARAMETERS, TEMPLATE)
+
+
+class GoogleCloudBucketHelperTest(unittest.TestCase):
+
+    @mock.patch(
+        'airflow.contrib.operators.dataflow_operator.GoogleCloudBucketHelper.__init__'
+    )
+    def test_invalid_object_path(self, mock_parent_init):
+
+        # This is just the path of a bucket hence invalid filename
+        file_name = 'gs://test-bucket'
+        mock_parent_init.return_value = None
+
+        gcs_bucket_helper = GoogleCloudBucketHelper()
+        gcs_bucket_helper._gcs_hook = mock.Mock()
+
+        with self.assertRaises(Exception) as context:
+            gcs_bucket_helper.google_cloud_to_local(file_name)
+
+        self.assertEquals(
+            'Invalid Google Cloud Storage (GCS) object path: {}.'.format(file_name),
+            str(context.exception))


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


>  TypeError in dataflow operators when using GCS jar or py_file
> --------------------------------------------------------------
>
>                 Key: AIRFLOW-2981
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2981
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, Dataflow
>    Affects Versions: 1.9.0, 1.10
>            Reporter: Jeffrey Payne
>            Assignee: Kaxil Naik
>            Priority: Major
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a
list to an int, resulting in the TypeError, with:
> {noformat}
> ...
> path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
> if path_components < 2:
> ...
> {noformat}
> This should be {{if len(path_components) < 2:}}.
> Also, fix {{if file_size > 0:}} in same function...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message