From user-return-26605-archive-asf-public=cust-asf.ponee.io@flink.apache.org Tue Mar 19 23:16:30 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4FBB9180626 for ; Wed, 20 Mar 2019 00:16:29 +0100 (CET) Received: (qmail 1932 invoked by uid 500); 19 Mar 2019 23:16:27 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 1913 invoked by uid 99); 19 Mar 2019 23:16:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Mar 2019 23:16:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 827F5C2875; Tue, 19 Mar 2019 23:16:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.049 X-Spam-Level: ** X-Spam-Status: No, score=2.049 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id DVvenJncinke; Tue, 19 Mar 2019 23:16:25 +0000 (UTC) Received: from mail-vs1-f41.google.com (mail-vs1-f41.google.com [209.85.217.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A54356115B; Tue, 19 Mar 2019 23:16:25 +0000 (UTC) Received: by mail-vs1-f41.google.com with SMTP id g127so354044vsd.6; Tue, 19 Mar 2019 16:16:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=wTT+7wRwKv3hLWGBL2oS88Eg3jdw7yQ32amTbXRKK6o=; b=bUgURW6qPirLM+ylJwCeqSl3Bqe/qaEfE8Ca/L038afEFnGGOCznDOyiECmBGwlLo0 4bBLwe01cOeWUENpSbco7S7Ihmm4F54oKBFicjBNp/jxzN+c9ndc9ZaNXv/wKWT32CJT JsetNk35dJaRCx4JATQUfkopB3NFy6t998t3VExmhqFXh9eSoveBCLRVC/6yEWI+G29N jtQczViPAb0DORa7QNz50Z4TgP+PhGMQd1M6YszZCn3J2FI/qIXUcwHnwZZjItZ2iWCu qoZj+2Vz5aB4e50SuqbSB1J61xQp/UaCtcO9DF7kbH/XsZ2CshXbClOCzrhe0JXYne2A /zCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=wTT+7wRwKv3hLWGBL2oS88Eg3jdw7yQ32amTbXRKK6o=; b=GZ+579wgNqdqXlpmPeq/knAa7Z4lSjJ29/2P2EW8Xeuili/eIkqFfrUl61fuHLMJgZ qySV34WsH29Z2iiaGOevfbUxppZp1FP66vxSWrMLB1g3utPmDErS4OeymN/lDBwlD8K2 JD49LhOvFV3BrZu9UfEQGgK6mlwmLHUO1MT/6Woi6eC2CqcEn1vsJk6GDIgpjm/sGwH4 ixIfZ6py2XDPhY0uDCFe6NIV1rrc+7aOedLc924atbyc+NfG5wPcAmgGLxK2E9hE3W4L kcxPlM0+9eiiEWrNVbhG7QTD+43vF/oU+lAdW5Kma3zXqd3AeWzD4Cg/uuhCJEiex7TM TAVA== X-Gm-Message-State: APjAAAUJheqaeldQVf5cgglQSpeUEAz09zetvfPhFLSelnw9m0BfxLT+ +zppoxRtmcTKETrHMudlsUG0NDnKvx8XEYy3a3Kwb6W+ X-Google-Smtp-Source: APXvYqz13TV6/i6vmNe6MSasfZBwniOkvkpP4Z4Tk8IeuRgqdT5PbqFSQ7BWo+jTVAJDEDxSy6uzk6EUjwvnEZQwSiQ= X-Received: by 2002:a67:e3c2:: with SMTP id k2mr3036826vsm.220.1553037384833; Tue, 19 Mar 2019 16:16:24 -0700 (PDT) MIME-Version: 1.0 From: Bowen Li Date: Tue, 19 Mar 2019 16:15:48 -0700 Message-ID: Subject: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs To: user , dev Content-Type: multipart/alternative; boundary="000000000000a5608105847ab161" --000000000000a5608105847ab161 Content-Type: text/plain; charset="UTF-8" Hi Flink users and devs, We want to get your feedbacks on integrating Flink with Hive. Background: In Flink Forward in Beijing last December, the community announced to initiate efforts on integrating Flink and Hive. On Feb 21 Seattle Flink Meetup , We presented Integrating Flink with Hive with a live demo to local community and got great response. As of mid March now, we have internally finished building Flink's brand-new catalog infrastructure, metadata integration with Hive, and most common cases of Flink reading/writing against Hive, and will start to submit more design docs/FLIP and contribute code back to community. The reason for doing it internally first and then in community is to ensure our proposed solutions are fully validated and tested, gain hands-on experience and not miss anything in design. You are very welcome to join this effort, from design/code review, to development and testing. *The most important thing we believe you, our Flink users/devs, can help RIGHT NOW is to share your Hive use cases and give us feedbacks for this project. As we start to go deeper on specific areas of integration, you feedbacks and suggestions will help us to refine our backlogs and prioritize our work, and you can get the features you want sooner! *Just for example, if most users is mainly only reading Hive data, then we can prioritize tuning read performance over implementing write capability. A quick review of what we've finished building internally and is ready to contribute back to community: - Flink/Hive Metadata Integration - Unified, pluggable catalog infra that manages meta-objects, including catalogs, databases, tables, views, functions, partitions, table/partition stats - Three catalog impls - A in-memory catalog, HiveCatalog for embracing Hive ecosystem, GenericHiveMetastoreCatalog for persisting Flink's streaming/batch metadata in Hive metastore - Hierarchical metadata reference as .. in SQL and Table API - Unified function catalog based on new catalog infra, also support Hive simple UDF - Flink/Hive Data Integration - Hive data connector that reads partitioned/non-partitioned Hive tables, and supports partition pruning, both Hive simple and complex data types, and basic write - More powerful SQL Client fully integrated with the above features and more Hive-compatible SQL syntax for better end-to-end SQL experience *Given above info, we want to learn from you on: How do you use Hive currently? How can we solve your pain points? What features do you expect from Flink-Hive integration? Those can be details like:* - *Which Hive version are you using? Do you plan to upgrade Hive?* - *Are you planning to switch Hive engine? What timeline are you looking at? Until what capabilities Flink has will you consider using Flink with Hive?* - *What's your motivation to try Flink-Hive? Maintain only one data processing system across your teams for simplicity and maintainability? Better performance of Flink over Hive itself?* - *What are your Hive use cases? How large is your Hive data size? Do you mainly do reading, or both reading and writing?* - *How many Hive user defined functions do you have? Are they mostly UDF, GenericUDF, or UDTF, or UDAF?* - any questions or suggestions you have? or as simple as how you feel about the project Again, your input will be really valuable to us, and we hope, with all of us working together, the project can benefits our end users. Please feel free to either reply to this thread or just to me. I'm also working on creating a questionnaire to better gather your feedbacks, watch for the maillist in the next couple days. Thanks, Bowen --000000000000a5608105847ab161 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi = Flink users and devs,

We = want to get your feedbacks on integrating Flink with Hive.

Background: In Flink Forward in Beijing = last December, the community announced to initiate efforts on integrating F= link and Hive. On Feb 21=C2=A0Seattle Flink Meetup, We presen= ted=C2=A0Integrating Flink with Hive=C2=A0with a live demo to local comm= unity and got great response. As of mid March now, we have internally finis= hed building Flink's brand-new catalog infrastructure, metadata integra= tion with Hive, and most common cases of Flink reading/writing against Hive= , and will start to submit more design docs/FLIP and contribute code back t= o community. The reason for doing it internally first and then in community= is to ensure our proposed solutions are fully validated and tested, gain h= ands-on experience and not miss anything in design. You are very welcome to= join this effort, from design/code review, to development and testing.

<= div class=3D"gmail_default" style=3D"font-size:small">The most important= thing we believe you, our Flink users/devs, can help RIGHT NOW is to share= your Hive use cases and give us feedbacks for this project. As we start to= go deeper on specific areas of integration, you feedbacks and suggestions = will help us to refine our backlogs and prioritize our work, and you can ge= t the features you want sooner! Just for example, if most users is main= ly only reading Hive data, then we can prioritize tuning read performance o= ver implementing write capability.
A quick review of what we've finished building = internally and is ready to contribute back to community:
  • Flink/Hive Metadata Integ= ration
    • Unified, pluggable catalog infra that manages meta-objec= ts, including catalogs, databases, tables, views, functions, partitions, ta= ble/partition stats
    • Three catalog impls - A in-memory catalog, Hive= Catalog for embracing Hive ecosystem, GenericHiveMetastoreCatalog for persi= sting Flink's streaming/batch metadata in Hive metastore
    • Hierar= chical metadata reference as <catalog_name>.<database_name>.<= ;metaobject_name> in SQL and Table API
    • Unified function catalog = based on new catalog infra, also support Hive simple UDF
  • Flink= /Hive Data Integration
    • Hive data connector that reads partition= ed/non-partitioned Hive tables, and supports partition pruning, both Hive s= imple and complex data types, and basic write
  • More powerful SQ= L Client fully integrated with the above features and more Hive-compatible = SQL syntax for better end-to-end SQL experience
Given abov= e info, we want to learn from you on: How do you use Hive currently? How ca= n we solve your pain points? What features do you expect from Flink-Hive in= tegration? Those can be details like:
  • Which Hive v= ersion are you using? Do you plan to upgrade Hive?
  • Are you p= lanning to switch Hive engine? What timeline are you looking at? Until what= capabilities Flink has will you consider using Flink with Hive?
  • What's your motivation to try Flink-Hive? Maintain only one data p= rocessing system across your teams for simplicity and maintainability? Bett= er performance of Flink over Hive itself?
  • What are your Hive= use cases?=C2=A0How large is your Hive data size? Do you mainly do reading= , or both reading and writing?
  • How many Hive user defined fu= nctions do you have? Are they mostly UDF, GenericUDF, or UDTF, or UDAF?=
  • any questions or suggestions you have? or as simple as how you fee= l about the project
Again, your input will be really valuable= to us, and we hope, with all of us working together, the project can benef= its our end users. Please feel free to either reply to this thread or just = to me. I'm also working on creating a questionnaire to better gather yo= ur feedbacks, watch for the maillist in the next couple days.

Thanks,
Bowen




--000000000000a5608105847ab161--