airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] mikemole commented on a change in pull request #4112: [AIRFLOW-3212] Add AwsGlueCatalogPartitionSensor
Date Tue, 30 Oct 2018 17:45:43 GMT
mikemole commented on a change in pull request #4112: [AIRFLOW-3212] Add AwsGlueCatalogPartitionSensor
URL: https://github.com/apache/incubator-airflow/pull/4112#discussion_r229415707
 
 

 ##########
 File path: airflow/contrib/hooks/aws_glue_catalog_hook.py
 ##########
 @@ -0,0 +1,117 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+from airflow.contrib.hooks.aws_hook import AwsHook
+
+
+class AwsGlueCatalogHook(AwsHook):
+    """
+    Interact with AWS Glue Catalog
+
+    :param aws_conn_id: ID of the Airflow connection where
+        credentials and extra configuration are stored
+    :type aws_conn_id: str
+    :param region_name: aws region name (example: us-east-1)
+    :type region_name: str
+    """
+
+    def __init__(self,
+                 aws_conn_id='aws_default',
+                 region_name=None,
+                 *args,
+                 **kwargs):
+        self.region_name = region_name
+        super(AwsGlueCatalogHook, self).__init__(aws_conn_id=aws_conn_id, *args, **kwargs)
+
+    def get_conn(self):
+        """
+        Returns glue connection object.
+        """
+        self.conn = self.get_client_type('glue', self.region_name)
+        return self.conn
+
+    def get_partitions(self,
+                       database_name,
+                       table_name,
+                       expression='',
+                       page_size=None,
+                       max_items=None):
+        """
+        Retrieves the partition values for a table.
+        :param database_name: The name of the catalog database where the partitions reside.
+        :type database_name: str
+        :param table_name: The name of the partitions' table.
+        :type table_name: str
+        :param expression: An expression filtering the partitions to be returned.
+            Please see official AWS documentation for further information.
+            https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html#aws-glue-api-catalog-partitions-GetPartitions
+        :type expression: str
+        :param page_size: pagination size
+        :type page_size: int
+        :param max_items: maximum items to return
+        :type max_items: int
+        :return: array of partition values where each value is itself an array as
+            a partition may be composed of multiple columns. For example:
+        [['2018-01-01','1'], ['2018-01-01','2']
+        """
+        config = {
+            'PageSize': page_size,
+            'MaxItems': max_items,
+        }
+
+        paginator = self.get_conn().get_paginator('get_partitions')
+        response = paginator.paginate(
+            DatabaseName=database_name,
+            TableName=table_name,
+            Expression=expression,
+            PaginationConfig=config
+        )
+
+        partitions = []
 
 Review comment:
   Ok, I switched to set.  I also changed partition values from lists to tuples since lists
are not hashable and therefore can't be added to a set.  A tuple makes more sense anyway.
 I added a commit so you could see just the latest changes.  If it looks ok, let me know,
and I'll squash the commits.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message