From user-return-27428-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Apr 29 14:17:39 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 72DB218061A for ; Mon, 29 Apr 2019 16:17:39 +0200 (CEST) Received: (qmail 81616 invoked by uid 500); 29 Apr 2019 14:17:37 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 81604 invoked by uid 99); 29 Apr 2019 14:17:37 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2019 14:17:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BBB4B180C67 for ; Mon, 29 Apr 2019 14:17:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.998 X-Spam-Level: * X-Spam-Status: No, score=1.998 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=eventador-io.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id abLogFFLt3Cl for ; Mon, 29 Apr 2019 14:17:34 +0000 (UTC) Received: from mail-oi1-f196.google.com (mail-oi1-f196.google.com [209.85.167.196]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4A3C05FBF7 for ; Mon, 29 Apr 2019 14:17:33 +0000 (UTC) Received: by mail-oi1-f196.google.com with SMTP id t70so2840260oif.5 for ; Mon, 29 Apr 2019 07:17:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eventador-io.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=DAyYLbh+yX75kMoG6OCMupVF1nycjb60M9wwnUnrHqQ=; b=mgehSGKCkk2HxEmLPSfIyR8+DBiMP86qYMmpxZ2kzgcTRs7fPacVyr80W2jX9DdSQH GbBU/MiE06Z7iC7f8JcjAGyjUvl2VTXDd6jGbf8KS/ZeRHWXEX/J1jSUxr5Ze1/URcJt VPJnnmyibiujGwaPgryWmBvhDEOvSOJLppPLO1JVmmO1Gm4ivitDseLNi9X2WHD+NEoL AgUrezwQWszXYTV119NQKbPLiOnvid9aMnDprHjjjv92OMWt5cO5GkJEG7OYG5bO8qXB JUqigoplWlrig4nkPoEdf1IR9At1jN0z9p8Q66ec14mFJyumGlsI14Sv0ZOOa2wtcXdI /NbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=DAyYLbh+yX75kMoG6OCMupVF1nycjb60M9wwnUnrHqQ=; b=oEl2wFLIIxppJolifXr1xRp2Oz2U9bFtEMKWBI+OP/5jTdFjl6xzGOF2ATpDSbYhpH tyzIn0mwTU4/pIRS+R+EIScu6/V8XbjWbSCk9PDM2dpAgL1Nd7rZzll/f0mME6X+cnVQ 7k6hMfxxZi4o9rA/daF98hDIt3LrrG0xcBcQrfiufmO3zmOlG5JmAXYWjU1oQZ6R0999 xjzUN0QjWymLNj6bLaBoMqOquZto3oV8ji7xEwz6gjVXNcCNp552jYF1mcBtCYh95UsM fMjm4XdEyoOF51h+NqEQEKkqcsg2WkL5PoaFBJwt8RvwV1NDFcR5aO8vWYlDSYJCuRlk USDA== X-Gm-Message-State: APjAAAWJVfuotYH0+hv53s+mAHm2/mwx1yvc/FGnyT+Q1eTRfRxKmJHb /FWdHqBXa2jqsVyhkLybuBIgKnFDvtLbNw== X-Google-Smtp-Source: APXvYqxLx2S/uOISImDzY1fUoZVhV200SBc2JWN2pRME2HXbzwTYcSL3YTpGRyq7lcq/K9lRWF17EQ== X-Received: by 2002:aca:35c5:: with SMTP id c188mr16913992oia.18.1556547451524; Mon, 29 Apr 2019 07:17:31 -0700 (PDT) Received: from [192.168.1.57] (72-48-63-77.static.grandenetworks.net. [72.48.63.77]) by smtp.gmail.com with ESMTPSA id k8sm2339601otp.52.2019.04.29.07.17.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Apr 2019 07:17:30 -0700 (PDT) From: Kenny Gorman Message-Id: <87D293D1-1569-4B49-B874-946720F49346@eventador.io> Content-Type: multipart/alternative; boundary="Apple-Mail=_7D66C582-8501-41CC-89E2-5944C8F3D0E9" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: Read mongo datasource in Flink Date: Mon, 29 Apr 2019 09:17:29 -0500 In-Reply-To: Cc: Flavio Pompermaier , Hai , user To: Wouter Zorgdrager References: X-Mailer: Apple Mail (2.3445.104.8) --Apple-Mail=_7D66C582-8501-41CC-89E2-5944C8F3D0E9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Just a thought, A robust and high performance way to potentially achieve = your goals is: Debezium->Kafka->Flink https://debezium.io/docs/connectors/mongodb/ = Good robust handling of various topologies, reasonably good scaling = properties, good restart-ability and such.. Thanks Kenny Gorman Co-Founder and CEO www.eventador.io > On Apr 29, 2019, at 7:47 AM, Wouter Zorgdrager = wrote: >=20 > Yes, that is correct. This is a really basic implementation that = doesn't take parallelism into account. I think you need something like = this [1] to get that working. >=20 > [1]: = https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#= dbcmd.parallelCollectionScan = > Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier = >: > But what about parallelism with this implementation? =46rom what I see = there's only a single thread querying Mongo and fetching all the = data..am I wrong? >=20 > On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager = > wrote: > For a framework I'm working on, we actually implemented a (basic) = Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the = data into a case class. It uses a Mongo observer to iterate over a = collection and emit it into a Flink context.=20 >=20 > Cheers, > Wouter >=20 > [1]: = https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/code= feedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource= .scala = =20 > [2]: http://json4s.org/ > Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier = >: > I'm not aware of an official source/sink..if you want you could try to = exploit the Mongo HadoopInputFormat as in [1]. > The provided link use a pretty old version of Flink but it should not = be a big problem to update the maven dependencies and the code to a = newer version. >=20 > Best, > Flavio >=20 > [1] https://github.com/okkam-it/flink-mongodb-test = > On Mon, Apr 29, 2019 at 6:15 AM Hai > wrote: > Hi, >=20 > Can anyone give me a clue about how to read mongodb=E2=80=99s data as = a batch/streaming datasource in Flink? I don=E2=80=99t find the mongodb = connector in recent release version . >=20 > Many thanks >=20 >=20 --Apple-Mail=_7D66C582-8501-41CC-89E2-5944C8F3D0E9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Just = a thought, A robust and high performance way to potentially achieve your = goals is:

Debezium->Kafka->Flink


Good robust handling of = various topologies, reasonably good scaling properties, good = restart-ability and such..

Thanks
Kenny = Gorman
Co-Founder and CEO



On Apr = 29, 2019, at 7:47 AM, Wouter Zorgdrager <W.D.Zorgdrager@tudelft.nl> wrote:

Yes, that is correct. This is a really basic implementation = that doesn't take parallelism into account. I think you need something = like this [1] to get that working.


Op ma 29 = apr. 2019 om 14:37 schreef Flavio Pompermaier <pompermaier@okkam.it>:
But what about parallelism with this = implementation? =46rom what I see there's only a single thread querying = Mongo and fetching all the data..am I wrong?

On Mon, Apr = 29, 2019 at 2:05 PM Wouter Zorgdrager <W.D.Zorgdrager@tudelft.nl> wrote:
For a = framework I'm working on, we actually implemented a (basic) Mongo source = [1]. It's written in Scala and uses Json4s [2] to parse the data into a = case class. It uses a Mongo observer to iterate over a collection and = emit it into a Flink context. 

Cheers,

Op ma 29 = apr. 2019 om 13:57 schreef Flavio Pompermaier <pompermaier@okkam.it>:
I'm not aware of an official source/sink..if you = want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it = should not be a big problem to update the maven dependencies and the = code to a newer version.

Best,
<= /div>
On Mon, Apr 29, 2019 at 6:15 AM Hai <hai@magicsoho.com> wrote:
Hi,

Can = anyone give me a clue about how to read mongodb=E2=80=99s data as a = batch/streaming datasource in Flink? I don=E2=80=99t find the mongodb = connector in recent release version .

Many = thanks



= --Apple-Mail=_7D66C582-8501-41CC-89E2-5944C8F3D0E9--