flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenny Gorman <ke...@eventador.io>
Subject Re: Read mongo datasource in Flink
Date Mon, 29 Apr 2019 14:17:29 GMT
Just a thought, A robust and high performance way to potentially achieve your goals is:

Debezium->Kafka->Flink

https://debezium.io/docs/connectors/mongodb/ <https://debezium.io/docs/connectors/mongodb/>

Good robust handling of various topologies, reasonably good scaling properties, good restart-ability
and such..

Thanks
Kenny Gorman
Co-Founder and CEO
www.eventador.io <http://www.eventador.io/>



> On Apr 29, 2019, at 7:47 AM, Wouter Zorgdrager <W.D.Zorgdrager@tudelft.nl> wrote:
> 
> Yes, that is correct. This is a really basic implementation that doesn't take parallelism
into account. I think you need something like this [1] to get that working.
> 
> [1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan
<https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan>
> Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>:
> But what about parallelism with this implementation? From what I see there's only a single
thread querying Mongo and fetching all the data..am I wrong?
> 
> On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <W.D.Zorgdrager@tudelft.nl <mailto:W.D.Zorgdrager@tudelft.nl>>
wrote:
> For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's
written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo
observer to iterate over a collection and emit it into a Flink context. 
> 
> Cheers,
> Wouter
> 
> [1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
<https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala>

> [2]: http://json4s.org/ <http://json4s.org/>
> Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>:
> I'm not aware of an official source/sink..if you want you could try to exploit the Mongo
HadoopInputFormat as in [1].
> The provided link use a pretty old version of Flink but it should not be a big problem
to update the maven dependencies and the code to a newer version.
> 
> Best,
> Flavio
> 
> [1] https://github.com/okkam-it/flink-mongodb-test <https://github.com/okkam-it/flink-mongodb-test>
> On Mon, Apr 29, 2019 at 6:15 AM Hai <hai@magicsoho.com <mailto:hai@magicsoho.com>>
wrote:
> Hi,
> 
> Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource
in Flink? I don’t find the mongodb connector in recent release version .
> 
> Many thanks
> 
> 


Mime
View raw message