flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Flink and Presto integration
Date Tue, 28 Jan 2020 11:04:04 GMT
Hive metastore is the de facto standard for Hadoop but in my use case I
have to query other databases (like MySQL, Oracle and SQL Server).
So Presto would be a good choice (apart from the fact that you need to
restart it when you add a new catalog..), and I'd like to have an easy
translation of the catalogs..
Another fear I have is that I could have different versions of the same
database type (e.g. Oracle or SQL server) and I'll probably hit an
incompatibility when using the latest jar of a connector.
>From what I see this corner case doesn't have a clear solution but I have
some workaround in mind that I need to verify (e.g. shade jars or allocate
source reader tasks to different Task Managers based on the deployed jar

On Tue, Jan 28, 2020 at 11:05 AM Piotr Nowojski <piotr@ververica.com> wrote:

> Hi,
> Yes, Presto (in presto-hive connector) is just using hive Metastore to get
> the table definitions/meta data. If you connect to the same hive Metastore
> with Flink, both systems should be able to see the same tables.
> Piotrek
> On 28 Jan 2020, at 04:34, Jingsong Li <jingsonglee0@gmail.com> wrote:
> Hi Flavio,
> Your requirement should be to use blink batch to read the tables in Presto?
> I'm not familiar with Presto's catalog. Is it like hive Metastore?
> If so, what needs to be done is similar to the hive connector.
> You need to implement a catalog of presto, which translates the Presto
> table into a Flink table. You may need to deal with partitions, statistics,
> and so on.
> Best,
> Jingsong Lee
> On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <
> itamar@bigdataboutique.com> wrote:
>> Yes, Flink does batch processing by "reevaluating a stream" so to speak.
>> Presto doesn't have sources and sinks, only catalogs (which are always
>> allowing reads, and sometimes also writes).
>> Presto catalogs are a configuration - they are managed on the node
>> filesystem as a configuration file and nowhere else. Flink sources/sinks
>> are programmatically configurable and are compiled into your Flink program.
>> So that is not possible at the moment, and all that's possible to do is get
>> that info form the API of both products and visualize that. Definitely not
>> managing them from a single place.
>> On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <pompermaier@okkam.it>
>> wrote:
>>> Both Presto and Flink make use of a Catalog in order to be able to
>>> read/write data from a source/sink.
>>> I don't agree about " Flink is about processing data streams" because
>>> Flink is competitive also for the batch workloads (and this will be further
>>> improved in the next releases).
>>> I'd like to register my data sources/sinks in one single catalog (E.g.
>>> Presto) and then being able to reuse it also in Flink (with a simple
>>> translation).
>>> My idea of integration here is thus more at catalog level since I would
>>> use Presto for exploring data from UI and Flink to process it because once
>>> the configuration part has finished (since I have many Flink jobs that I
>>> don't want to throw away or rewrite).
>>> On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <
>>> itamar@bigdataboutique.com> wrote:
>>>> Hi Flavio,
>>>> Presto contributor and Starburst Partners here.
>>>> Presto and Flink are solving completely different challenges. Flink is
>>>> about processing data streams as they come in; Presto is about ad-hoc /
>>>> periodic querying of data sources.
>>>> A typical architecture would use Flink to process data streams and
>>>> write data and aggregations to some data stores (Redis, MemSQL, SQLs,
>>>> Elasticsearch, etc) and then using Presto to query those data stores (and
>>>> possible also others using Query Federation).
>>>> What kind of integration will you be looking for?
>>>> On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>> Hi all,
>>>>> is there any integration between Presto and Flink? I'd like to use
>>>>> Presto for the UI part (preview and so on) while using Flink for the
>>>>> processing. Do you suggest something else otherwise?
>>>>> Best,
>>>>> Flavio
>>>> --
>>>> [image: logo] <https://bigdataboutique.com/>
>>>> Itamar Syn-Hershko
>>>> CTO, Founder
>>>> +972-54-2467860
>>>> itamar@bigdataboutique.com
>>>> https://bigdataboutique.com
>>>> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013>
>>>> <https://twitter.com/synhershko>
>>>> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg>
>> --
>> [image: logo] <https://bigdataboutique.com/>
>> Itamar Syn-Hershko
>> CTO, Founder
>> +972-54-2467860
>> itamar@bigdataboutique.com
>> https://bigdataboutique.com
>> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013>
>> <https://twitter.com/synhershko>
>> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg>
> --
> Best, Jingsong Lee

View raw message