Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C15E4200D1D for ; Sat, 14 Oct 2017 10:53:34 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BFEF61609EA; Sat, 14 Oct 2017 08:53:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 145221609D5 for ; Sat, 14 Oct 2017 10:53:33 +0200 (CEST) Received: (qmail 73727 invoked by uid 500); 14 Oct 2017 08:53:33 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 73715 invoked by uid 99); 14 Oct 2017 08:53:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Oct 2017 08:53:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AE634180967 for ; Sat, 14 Oct 2017 08:53:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.481 X-Spam-Level: ** X-Spam-Status: No, score=2.481 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=driesprongen-nl.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id guUcz4FPf4Do for ; Sat, 14 Oct 2017 08:53:28 +0000 (UTC) Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id AD0C85FE18 for ; Sat, 14 Oct 2017 08:53:27 +0000 (UTC) Received: by mail-wm0-f47.google.com with SMTP id l68so25471574wmd.5 for ; Sat, 14 Oct 2017 01:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=driesprongen-nl.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=fiZ6ZSJof4VY3udOqrp0Y8vVNmJHrAjeHW8PkoR+I28=; b=EdN/yd7L2NASBY7t7SSg9iarOVgOdJ2qP6W/S29vfQS0JbEK3wxHcMugKOccT0+T09 M4axsqlWakPRodyDMyygpo8ZLKlxWoVwXEBqqW3X6B+sQiSJ/KNe7h6iCBVA6j2bUdiW DsvK6op6tZpOdjEBjbiU84vUMwqzF9yqFI9D0BLwnosf01ifXfeOmDrcGERipfIypjac aJiVwu2dVdTSfW2+un2uKZFAfnyg+YY9UXJVNZH+qzDCG2JQEE2LXRrG91SaeA+LNUjv efbb8Acc1wkcz8SWoPv2c68zuWvggSWWHO8gQSbhLVMcZOnaIghoQzgUySQh+XNP1mAG WXFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=fiZ6ZSJof4VY3udOqrp0Y8vVNmJHrAjeHW8PkoR+I28=; b=jGaf+tcC2f2RkEdA0bYEOwIDphRfVkB+L+ZP+akTeOf2QeU9SNovBShBPKBh3ng6Ha q2MJC5isqX8lv5lVbVclMvmcz8b+uRbjKTDo22bp04dh50Rfx7irJmdNKupO7NlNkzuL av9KfK384+XJuvc/lQ4aUTqn3XIHsFRCDfse85IHo8tat4rac/XcuRI+4Oap4nbxxb+u YLh1KzOaKhQ29wBP1Clxcokj73rjwr71O4kbw5N5ZjvoQB4K2vjAhJOCzU19rlc3N493 q3GI0/Xngz4+qPe6x6iDvHbkdQGp4rA2jKLwIj/TfKM4na/gHQTPpaMO6H00zatGM2KR gWBA== X-Gm-Message-State: AMCzsaUUcm3JRJum8IxXCgYOMb+MeTadiI91Frg9J3/79Neh+YkO/hsX uXh6DBDE68lcVVAigws8/VKA3HWKI0TGi3lWRHDtZBZe X-Google-Smtp-Source: AOwi7QB5aubAxXEyzZDJkPgCiA3jXZkGGScxUGtxGoh88BYmvAw/NqZ+efyCjAqPUAcbUtp+yBy5/l6hUjtL7QXoiQA= X-Received: by 10.80.218.202 with SMTP id s10mr5365219edj.212.1507971200128; Sat, 14 Oct 2017 01:53:20 -0700 (PDT) MIME-Version: 1.0 Sender: fokko@driesprongen.nl Received: by 10.80.170.9 with HTTP; Sat, 14 Oct 2017 01:53:19 -0700 (PDT) In-Reply-To: References: From: "Driesprong, Fokko" Date: Sat, 14 Oct 2017 10:53:19 +0200 X-Google-Sender-Auth: wlr4WFfQUVjiNve2dXDiZ88tIOo Message-ID: Subject: Re: Return results optionally from spark_sql_hook To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary="089e0821ea6cb72a33055b7de717" archived-at: Sat, 14 Oct 2017 08:53:34 -0000 --089e0821ea6cb72a33055b7de717 Content-Type: text/plain; charset="UTF-8" Hi Boris, Thank you for your question and excuse me for the late response, currently I'm on holiday. The solution that you suggest, would not be my preferred choice. Extracting results from a log using a regex is expensive in terms of computational costs, and error prone. My question is, what are you trying to accomplish? For me there are two ways of using the Spark-sql operator: 1. ETL Using Spark: Instead of returning the results, write the results back to a new table, or a new partition within the table. This data can be used downstream in the dag. Also, this will write the data to hdfs which is nice for persistance. 2. Write the data in a simple and widely supported format (such as csv) onto hdfs. Now you can get the data from hdfs using `hdfs dfs -get` to you local file-system. Or use `hdfs dfs -cat ... | application.py` to pipe it to your application directly. What you are trying to accomplish, looks for me something that would fit the spark-submit job, where you can submit pyspark applications where you can directly fetch the results from Spark: Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Python version 2.7.14 (default, Oct 11 2017 10:13:33) SparkSession available as 'spark'. >>> spark.sql("SELECT 1 as count").first() Row(count=1) Most of the time we use the Spark-sql to transform the data, then use sqoop to get the data from hdfs to a rdbms to expose the data to the business. These examples are for Spark using hdfs, but for s3 it is somewhat the same. Does this answer your question, if not, could you elaborate the problem that you are facing? Ciao, Fokko 2017-10-13 15:54 GMT+02:00 Boris : > hi guys, > > I opened JIRA on this and will be working on PR > https://issues.apache.org/jira/browse/AIRFLOW-1713 > > any objections/suggestions conceptually? > > Fokko, I see you have been actively contributing to spark hooks and > operators so I could use your opinion before I implement this. > > Boris > --089e0821ea6cb72a33055b7de717--