flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Ingestion of data into HDFS
Date Fri, 22 May 2015 16:29:56 GMT
If you simply want to trigger two Flink jobs one after the other, you can
simply do this in one program.

Since the "env.execute()" call blocks, the second program starts after the
first.

-----

ExecutionEnvironment env1 = ExecutionEnvironment.getExecutionEnvironment();

// program 1

env1.execute();


ExecutionEnvironment env2 = ExecutionEnvironment.getExecutionEnvironment();

// program 2

env2.execute();




On Fri, May 22, 2015 at 6:21 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> I'm not sure if I got your question right.
>
> Do you want to know if it is possible to implement a Flink program that
> reads several files and writes their data into a Parquet format?
> Or are you asking how such a job could be scheduled for execution based on
> some external event (such as a file appearing)?
>
> Both should be possible.
>
> The job would be a simple pipeline with or without some transformations
> depending on the required logic and a Parquet data sink.
> The job execution can be triggered from outside of Flink for example using
> a monitoring process or a cron job that calls the CLI client with the right
> parameters.
>
> Best, Fabian
>
>
>
> 2015-05-22 14:55 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> Hi to all,
>>
>> in my use case I have bursts of data to store into hdfs and once
>> finished, compact them into a single directory (as Parquet). From what I
>> know, the current approach is to use Flume that automatically ingest data
>> and compact them based on some configurable policy.
>> However I'd like to avoid to add Flume to my architecture because these
>> bursts are not long lived processed so I just want to write a batch of rows
>> as a single file in some directory, and once the process finish, i want to
>> read all of them and compact into a single output directory as Parquet.
>> It's something similar to a streaming process but (for the moment) I'd
>> like to avoid to have a long lived Flink process listening for incoming
>> data.
>>
>> Do you have any suggestion for such a process or is there any example in
>> Flink code?
>>
>>
>> Best,
>> Flavio
>>
>
>

Mime
View raw message