Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4D45D200D0C for ; Wed, 6 Sep 2017 06:18:10 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4BFA91610F3; Wed, 6 Sep 2017 04:18:10 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6C5EA1609E0 for ; Wed, 6 Sep 2017 06:18:09 +0200 (CEST) Received: (qmail 75606 invoked by uid 500); 6 Sep 2017 04:18:08 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 75597 invoked by uid 99); 6 Sep 2017 04:18:08 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Sep 2017 04:18:08 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id E7DCE1A5839 for ; Wed, 6 Sep 2017 04:18:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hjKhKjJfiAoC for ; Wed, 6 Sep 2017 04:18:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 38A775FDA1 for ; Wed, 6 Sep 2017 04:18:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AD9A5E045B for ; Wed, 6 Sep 2017 04:18:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 6162F2415D for ; Wed, 6 Sep 2017 04:18:03 +0000 (UTC) Date: Wed, 6 Sep 2017 04:18:03 +0000 (UTC) From: "Siddharth (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AIRFLOW-1560) Add AWS DynamoDB hook for inserting batch items MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 06 Sep 2017 04:18:10 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth updated AIRFLOW-1560: ------------------------------- Description: The PR addresses airflow integration with AWS Dynamodb. Currently there is no hook to interact with DynamoDb for reading or writing items (single or batch insertions). To get started, we want to push data in DynamoDB using airflow jobs (scheduled daily). Idea is to read aggregates from Hive and push in DynamoDB (write data job will run everyday to make this happen). First we want to create DynamoDB hooks (this PR addressed the same) and then create operator to move data from Hive to DynamoDB (added hive to dynamo transfer operator) I noticed that currently airflow has AWS_HOOK (parent hook for connecting to AWS using credentials stored in configs). It has a function to connect to AWS objects using Client API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#client) which is specific to EMR_HOOK. But in case of inserting data we can use DynamoDB Resource API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#service-resource) which provides higher level abstractions for inserting data in DynamoDB). One good question to ask can be difference between client and resource and why use one or the other? "Resources are higher-level abstraction than the raw, low-level calls made by service clients. They can't do anything the clients can't do, but in many cases they are nicer to use. The downside is that they don't always support 100% of the features of a service." (http://boto3.readthedocs.io/en/latest/guide/resources.html) was: The PR addresses airflow integration with AWS Dynamodb. Currently there is no hook to interact with DynamoDb for reading or writing items (single or batch insertions). To get started, we want to push data in DynamoDB using airflow jobs (scheduled daily). Idea is to read aggregates from S3 and push in DynamoDB (write data job will run everyday to make this happen). First we want to create DynamoDB hooks (this PR addressed the same) and then create operator to move data from S3 to DynamoDB (added hive to dynamo transfer operator) I noticed that currently airflow has AWS_HOOK (parent hook for connecting to AWS using credentials stored in configs). It has a function to connect to AWS objects using Client API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#client) which is specific to EMR_HOOK. But in case of inserting data we can use DynamoDB Resource API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#service-resource) which provides higher level abstractions for inserting data in DynamoDB). One good question to ask can be difference between client and resource and why use one or the other? "Resources are higher-level abstraction than the raw, low-level calls made by service clients. They can't do anything the clients can't do, but in many cases they are nicer to use. The downside is that they don't always support 100% of the features of a service." (http://boto3.readthedocs.io/en/latest/guide/resources.html) > Add AWS DynamoDB hook for inserting batch items > ----------------------------------------------- > > Key: AIRFLOW-1560 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1560 > Project: Apache Airflow > Issue Type: New Feature > Components: aws, boto3, hooks > Reporter: Siddharth > Assignee: Siddharth > > The PR addresses airflow integration with AWS Dynamodb. > Currently there is no hook to interact with DynamoDb for reading or writing items (single or batch insertions). To get started, we want to push data in DynamoDB using airflow jobs (scheduled daily). Idea is to read aggregates from Hive and push in DynamoDB (write data job will run everyday to make this happen). First we want to create DynamoDB hooks (this PR addressed the same) and then create operator to move data from Hive to DynamoDB (added hive to dynamo transfer operator) > I noticed that currently airflow has AWS_HOOK (parent hook for connecting to AWS using credentials stored in configs). It has a function to connect to AWS objects using Client API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#client) which is specific to EMR_HOOK. But in case of inserting data we can use DynamoDB Resource API (http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#service-resource) which provides higher level abstractions for inserting data in DynamoDB). One good question to ask can be difference between client and resource and why use one or the other? "Resources are higher-level abstraction than the raw, low-level calls made by service clients. They can't do anything the clients can't do, but in many cases they are nicer to use. The downside is that they don't always support 100% of the features of a service." (http://boto3.readthedocs.io/en/latest/guide/resources.html) -- This message was sent by Atlassian JIRA (v6.4.14#64029)