flink-user-zh mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chai <chaiy...@didachuxing.com>
Subject Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs
Date Wed, 20 Mar 2019 07:25:12 GMT
Here is my production environment,the version is CDH 5.9 and hive 1.2.1 ,hive 2.3.4 is too
new for me.

> 在 2019年3月20日,11:44,Shaoxuan Wang <wshaoxuan@gmail.com> 写道:
> Hi Bowen,
> Thanks for driving this. I am CCing this email/survey to user-zh@
> flink.apache.org as well.
> I heard there are lots of interests on Flink-Hive from the field. One of
> the biggest requests the hive users are raised is "the support of
> out-of-date hive version". A large amount of users are still working on the
> cluster with CDH/HDP installed with old hive version, say 1.2.1/2.1.1. We
> need ensure the support of these Hive version when planning the work on
> Flink-Hive integration.
> *@all. "We want to get your feedbacks on Flink-Hive integration." *
> Regards,
> Shaoxuan
> On Wed, Mar 20, 2019 at 7:16 AM Bowen Li <bowenli86@gmail.com> wrote:
>> Hi Flink users and devs,
>> We want to get your feedbacks on integrating Flink with Hive.
>> Background: In Flink Forward in Beijing last December, the community
>> announced to initiate efforts on integrating Flink and Hive. On Feb 21 Seattle
>> Flink Meetup <https://www.meetup.com/seattle-flink/events/258723322/>, We
>> presented Integrating Flink with Hive
>> <https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-xuefu-zhang-and-bowen-li-seattle-flink-meetup-feb-2019>
>> a live demo to local community and got great response. As of mid March now,
>> we have internally finished building Flink's brand-new catalog
>> infrastructure, metadata integration with Hive, and most common cases of
>> Flink reading/writing against Hive, and will start to submit more design
>> docs/FLIP and contribute code back to community. The reason for doing it
>> internally first and then in community is to ensure our proposed solutions
>> are fully validated and tested, gain hands-on experience and not miss
>> anything in design. You are very welcome to join this effort, from
>> design/code review, to development and testing.
>> *The most important thing we believe you, our Flink users/devs, can help
>> RIGHT NOW is to share your Hive use cases and give us feedbacks for this
>> project. As we start to go deeper on specific areas of integration, you
>> feedbacks and suggestions will help us to refine our backlogs and
>> prioritize our work, and you can get the features you want sooner! *Just
>> for example, if most users is mainly only reading Hive data, then we can
>> prioritize tuning read performance over implementing write capability.
>> A quick review of what we've finished building internally and is ready to
>> contribute back to community:
>>   - Flink/Hive Metadata Integration
>>      - Unified, pluggable catalog infra that manages meta-objects,
>>      including catalogs, databases, tables, views, functions, partitions,
>>      table/partition stats
>>      - Three catalog impls - A in-memory catalog, HiveCatalog for
>>      embracing Hive ecosystem, GenericHiveMetastoreCatalog for persisting
>>      Flink's streaming/batch metadata in Hive metastore
>>      - Hierarchical metadata reference as
>>      <catalog_name>.<database_name>.<metaobject_name> in SQL and
Table API
>>      - Unified function catalog based on new catalog infra, also support
>>      Hive simple UDF
>>   - Flink/Hive Data Integration
>>      - Hive data connector that reads partitioned/non-partitioned Hive
>>      tables, and supports partition pruning, both Hive simple and complex data
>>      types, and basic write
>>   - More powerful SQL Client fully integrated with the above features
>>   and more Hive-compatible SQL syntax for better end-to-end SQL experience
>> *Given above info, we want to learn from you on: How do you use Hive
>> currently? How can we solve your pain points? What features do you expect
>> from Flink-Hive integration? Those can be details like:*
>>   - *Which Hive version are you using? Do you plan to upgrade Hive?*
>>   - *Are you planning to switch Hive engine? What timeline are you
>>   looking at? Until what capabilities Flink has will you consider using Flink
>>   with Hive?*
>>   - *What's your motivation to try Flink-Hive? Maintain only one data
>>   processing system across your teams for simplicity and maintainability?
>>   Better performance of Flink over Hive itself?*
>>   - *What are your Hive use cases? How large is your Hive data size? Do
>>   you mainly do reading, or both reading and writing?*
>>   - *How many Hive user defined functions do you have? Are they mostly
>>   UDF, GenericUDF, or UDTF, or UDAF?*
>>   - any questions or suggestions you have? or as simple as how you feel
>>   about the project
>> Again, your input will be really valuable to us, and we hope, with all of
>> us working together, the project can benefits our end users. Please feel
>> free to either reply to this thread or just to me. I'm also working on
>> creating a questionnaire to better gather your feedbacks, watch for the
>> maillist in the next couple days.
>> Thanks,
>> Bowen

View raw message