flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Heiler <georg.kf.hei...@gmail.com>
Subject Re: Flink first project
Date Thu, 27 Apr 2017 17:59:21 GMT
Thanks for the overview. I think I will use akka streams and pipe the
result to kafka, then move on with flink.
Tzu-Li (Gordon) Tai <tzulitai@apache.org> schrieb am Do. 27. Apr. 2017 um

> Hi Georg,
> Simply from the aspect of a Flink source that listens to a REST endpoint
> for input data, there should be quite a variety of options to do that. The
> Akka streaming source from Bahir should also serve this purpose well. It
> would also be quite straightforward to implement one yourself.
> On the other hand, what Jörn was suggesting was that you would want to
> first persist the incoming data from the REST endpoint to a repayable
> storage / queue, and your Flink job reads from that replayable storage /
> queue.
> The reason for this is that Flink’s checkpointing mechanism for
> exactly-once guarantee relies on a replayable source (see [1]), and since a
> REST endpoint is not replayable, you’ll not be able to benefit from the
> fault-tolerance guarantees provided by Flink. The most popular source used
> with Flink for exactly-once, currently, is Kafka [2]. The only extra
> latency compared to just fetching REST endpoint, in this setup, is writing
> to the intermediate Kafka topic.
> Of course, if you’re just testing around and just getting to know Flink,
> this setup isn’t necessary.
> You can just start off with a source such as the Flink Akka connector in
> Bahir, and start writing your first Flink job right away :)
> Cheers,
> Gordon
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/guarantees.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html
> On 24 April 2017 at 4:02:14 PM, Georg Heiler (georg.kf.heiler@gmail.com)
> wrote:
> Wouldn't adding flume -> Kafka -> flink also introduce additional latency?
> Georg Heiler <georg.kf.heiler@gmail.com> schrieb am So., 23. Apr. 2017 um
> 20:23 Uhr:
>> So you would suggest flume over a custom akka-source from bahir?
>> Jörn Franke <jornfranke@gmail.com> schrieb am So., 23. Apr. 2017 um
>> 18:59 Uhr:
>>> I would use flume to import these sources to HDFS and then use flink or
>>> Hadoop or whatever to process them. While it is possible to do it in flink,
>>> you do not want that your processing fails because the web service is not
>>> available etc.
>>> Via flume which is suitable for this kind of tasks it is more controlled
>>> and reliable.
>>> On 23. Apr 2017, at 18:02, Georg Heiler <georg.kf.heiler@gmail.com>
>>> wrote:
>>> New to flink I would like to do a small project to get a better feeling
>>> for flink. I am thinking of getting some stats from several REST api (i.e.
>>> Bitcoin course values from different exchanges) and comparing prices over
>>> different exchanges in real time.
>>> Are there already some REST api sources for flink as a sample to get
>>> started implementing a custom REST source?
>>> I was thinking about using https://github.com/timmolter/XChange to
>>> connect to several exchanges. E.g. to make a single api call by hand would
>>> look similar to
>>> val currencyPair = new CurrencyPair(Currency.XMR, Currency.BTC)
>>>   CertHelper.trustAllCerts()
>>>   val poloniex =
>>> ExchangeFactory.INSTANCE.createExchange(classOf[PoloniexExchange].getName)
>>>   val dataService = poloniex.getMarketDataService
>>>   generic(dataService)
>>>   raw(dataService.asInstanceOf[PoloniexMarketDataServiceRaw])
>>> System.out.println(dataService.getTicker(currencyPair))
>>> How would be a proper way to make this available as a flink source? I
>>> have seen
>>> https://github.com/apache/flink/blob/master/flink-contrib/flink-connector-wikiedits/src/main/java/org/apache/flink/streaming/connectors/wikiedits/WikipediaEditsSource.java
>>> new to flink am a bit unsure how to proceed.
>>> Regards,
>>> Georg

View raw message