Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0E7A6200C91 for ; Sun, 11 Jun 2017 10:50:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0D054160BD8; Sun, 11 Jun 2017 08:50:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ADA51160BCF for ; Sun, 11 Jun 2017 10:50:43 +0200 (CEST) Received: (qmail 74447 invoked by uid 500); 11 Jun 2017 08:50:42 -0000 Mailing-List: contact user-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@beam.apache.org Delivered-To: mailing list user@beam.apache.org Received: (qmail 74437 invoked by uid 99); 11 Jun 2017 08:50:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jun 2017 08:50:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6E8D2C0A49 for ; Sun, 11 Jun 2017 08:50:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=veolia.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id NnXXSxGXTBrT for ; Sun, 11 Jun 2017 08:50:38 +0000 (UTC) Received: from mail-vk0-f54.google.com (mail-vk0-f54.google.com [209.85.213.54]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 422035F613 for ; Sun, 11 Jun 2017 08:50:38 +0000 (UTC) Received: by mail-vk0-f54.google.com with SMTP id p62so39780677vkp.0 for ; Sun, 11 Jun 2017 01:50:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=veolia.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=4uQuBNnlAt1G44rHn62leF/t20oic3o357RXPAFIVd4=; b=ceFuZ+teO9Rx6f//Dt8+SryJ98Yv6IVBw2FU0skZ0qk7pSvOg1mZnSWzJoaxmCKHWO 3gbVRIC+vAGyGgGWxSzpVMJwpqsryi2fFQD73xyog2BO4qT0Ljy0mCyHe1/z9tmwzxL9 QbKiJHnmKksCD/yMu2SD9GtU02m0dymmMBNdw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=4uQuBNnlAt1G44rHn62leF/t20oic3o357RXPAFIVd4=; b=AakkShOhHj5yEo7HqLxGSAESAZ4SsfIUNacnEfLnkN9gOl3yesIwLOGCp7t8b2sn2Z k8KCMqF32uWhFPqlFqTY6NqBQg2HQXuJPg2+MbrdDPCowdlqkJSUJDgoGlxwOeJ2qF7D MQ3BTD+OFNXhuCV7++lwt9Es9nTmVIPfcThPTf6oAoEQ/PGnDcwCIAqA/kq6hZZq74Q8 XgPc/YyeD75vwjBO5eZR3rXfiPPqeMLus3dMpMOwqyzdQFngEnyekMsioCka/9YvHPOf roaLRDFxA70R/saP09Ip23DF8+EQ7EFTASm0NDHtdqaSyB2G8aUSQ7BOcNqA6XAOeGg0 36Ng== X-Gm-Message-State: AODbwcCckpmySgGHWjozOGFCXbwt6uP675DhPXX5ACowaEPspFvbE7t9 eYfO4f3oMsnTr5zEHOJFW055Zq73YJRYfMr4fQv+vcsvGdrmTkMf7V7SCyVxAcWCeSP6h5Iq9Gi 73CBmccCRWSI3pv+q X-Received: by 10.31.230.70 with SMTP id d67mr25893346vkh.74.1497171032058; Sun, 11 Jun 2017 01:50:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.103.130.198 with HTTP; Sun, 11 Jun 2017 01:50:01 -0700 (PDT) In-Reply-To: References: From: "Morand, Sebastien" Date: Sun, 11 Jun 2017 10:50:01 +0200 Message-ID: Subject: Re: Action in the pipeline after Write To: user@beam.apache.org Content-Type: multipart/alternative; boundary="94eb2c09162289040e0551ab4b0e" archived-at: Sun, 11 Jun 2017 08:50:45 -0000 --94eb2c09162289040e0551ab4b0e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes this use case can be treated by using parallel operation. I have a 2nd one, I would like to send a report at the end of the pipeline when the last line has been written in bigquery: number of lines treated, number of lines ignored (from another part of the pipeline using graph as you described), number of files at the begining, and so on. This report could be: 1. Write a pub/sub 2. Send an email 3. Call an url with parameters Is this possible? Regards, *S=C3=A9bastien MORAND* Team Lead Solution Architect Technology & Operations / Digital Factory Veolia - Group Information Systems & Technology (IS&T) Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 Bureau 0144C (Ouest) 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France *www.veolia.com * On 11 June 2017 at 04:14, Eugene Kirpichov wrote: > Hi! > It sounds like you want to write data to BigQuery and then load the same > data back from BigQuery? Why? I'm particularly confused by your comment > "nothing left in the PCollection" - writing a collection to BigQuery > doesn't remove data from the collection, a PCollection is just a logical > description of a dataset, not a mutable container. Transforms are like > mathematical functions - they don't change their inputs, they only comput= e > their outputs. > > Perhaps that you're assuming that Beam pipelines can only be a strict > linear sequence of transforms? That is not the case - pipelines are an > arbitrary graph, you can use a collection multiple times, i.e. apply > multiple transforms to it. E.g. you can both write the collection to > bigquery (step 3) and apply some other transform to the same collection > (step 5). > > Assuming you use Java: > PCollection foos =3D p.apply(TextIO.read().from(...)).apply(...some > transform...); > foos.apply(BigQueryIO.write().to(...)); > PCollection bars =3D foos.apply(...some other transform...); > bars.apply(BigQueryIO.write().to(...)); > > Let me know if this helps. > > On Sat, Jun 10, 2017 at 3:42 PM Morand, Sebastien < > sebastien.morand@veolia.com> wrote: > >> Hi, >> >> Is there any way to add some step after a Write, because Write return un >> PDone, so I can't do anything, but I would like actually do something. >> >> Example : >> >> 1. Load data from gcs >> 2. Some transform >> 3. Write data into bigquery >> =3D> Nothing left in the pcollection, but when 3 is over =3D> >> 4. Load data from bigquery >> 5. Some other transform >> 6. Write data into bigquery >> >> Any way to do that? >> >> Thanks, >> >> *S=C3=A9bastien MORAND* >> Team Lead Solution Architect >> Technology & Operations / Digital Factory >> Veolia - Group Information Systems & Technology (IS&T) >> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >> <+33%201%2085%2057%2071%2008> >> Bureau 0144C (Ouest) >> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >> *www.veolia.com * >> >> >> >> >> >> >> >> ------------------------------------------------------------ >> -------------------------------- >> This e-mail transmission (message and any attached files) may contain >> information that is proprietary, privileged and/or confidential to Veoli= a >> Environnement and/or its affiliates and is intended exclusively for the >> person(s) to whom it is addressed. If you are not the intended recipient= , >> please notify the sender by return e-mail and delete all copies of this >> e-mail, including all attachments. Unless expressly authorized, any use, >> disclosure, publication, retransmission or dissemination of this e-mail >> and/or of its attachments is strictly prohibited. >> >> Ce message electronique et ses fichiers attaches sont strictement >> confidentiels et peuvent contenir des elements dont Veolia Environnement >> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >> message par erreur, merci de le retourner a son emetteur et de le detrui= re >> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la >> publication, la distribution, ou la reproduction non expressement >> autorisees de ce message et de ses pieces attachees sont interdites. >> ------------------------------------------------------------ >> -------------------------------- >> > --=20 ---------------------------------------------------------------------------= ----------------- This e-mail transmission (message and any attached files) may contain=20 information that is proprietary, privileged and/or confidential to Veolia= =20 Environnement and/or its affiliates and is intended exclusively for the=20 person(s) to whom it is addressed. If you are not the intended recipient,= =20 please notify the sender by return e-mail and delete all copies of this=20 e-mail, including all attachments. Unless expressly authorized, any use,=20 disclosure, publication, retransmission or dissemination of this e-mail=20 and/or of its attachments is strictly prohibited.=20 Ce message electronique et ses fichiers attaches sont strictement=20 confidentiels et peuvent contenir des elements dont Veolia Environnement=20 et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc=20 destines a l'usage de leurs seuls destinataires. Si vous avez recu ce=20 message par erreur, merci de le retourner a son emetteur et de le detruire= =20 ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la=20 publication, la distribution, ou la reproduction non expressement=20 autorisees de ce message et de ses pieces attachees sont interdites. ---------------------------------------------------------------------------= ----------------- --94eb2c09162289040e0551ab4b0e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yes this use case can be treated by using parallel operati= on.

I have a 2nd one, I would like to send a report at t= he end of the pipeline when the last line has been written in bigquery: num= ber of lines treated, number of lines ignored (from another part of the pip= eline using graph as you described), number of files at the begining, and s= o on.

This report could be:
  1. Writ= e a pub/sub
  2. Send an email
  3. Call an url with parameters
Is this possible?

Regards,



S=C3=A9bastie= n MORAND
Team Lead Solution Architect
Tech= nology & Operations / Digital Factory
Veoli= a - Group Information Systems & Technology (IS&T)
Cell.:=C2=A0+33 7=C2=A052 66 20 81=C2=A0/=C2=A0Direct:=C2=A0+33 1 85 57 71 08
Bureau 0144C (Ouest)
30, rue Madelein= e-Vionnet=C2=A0-=C2=A093300 Aubervilliers, France

=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 =C2=A0
=

On 11 June 2017 at 04:14, Eugene Kirpichov <= span dir=3D"ltr"><kirpichov@google.com> wrote:
Hi!
It sounds like you want to write data to Big= Query and then load the same data back from BigQuery? Why? I'm particul= arly confused by your comment "nothing left in the PCollection" -= writing a collection to BigQuery doesn't remove data from the collecti= on, a PCollection is just a logical description of a dataset, not a mutable= container. Transforms are like mathematical functions - they don't cha= nge their inputs, they only compute their outputs.

Perhaps that you're assuming that Beam pipelines can only be a strict = linear sequence of transforms? That is not the case - pipelines are an arbi= trary graph, you can use a collection multiple times, i.e. apply multiple t= ransforms to it. E.g. you can both write the collection to bigquery (step 3= ) and apply some other transform to the same collection (step 5).

Assuming you use Java:
PCollection<Foo> foo= s =3D p.apply(TextIO.read().from(...)).apply(...some transform...);
foos.apply(BigQueryIO.write().to(...));
PCollection&= lt;Bar> bars =3D foos.apply(...some other transform...);
bars.= apply(BigQueryIO.write().to(...));

Let me kno= w if this helps.

On Sat, Jun 10, 2017 at 3:42 PM Morand, Sebastien = <sebast= ien.morand@veolia.com> wrote:
Hi,
Is there any way to add some step after a Write, because Write= return un PDone, so I can't do anything, but I would like actually do = something.

Example :
  1. Load data f= rom gcs
  2. Some transform
  3. Write data into bigquery
    =3D> = Nothing left in the pcollection, but when 3 is over =3D>
  4. Load da= ta from bigquery
  5. Some other transform
  6. Write data into bigqu= ery
Any way to do that?

Thanks= ,

S=C3=A9bastien MORAND
=
Team Lea= d Solution Architect
Technology & Ope= rations / Digital Factory
Veolia - Group Inform= ation Systems & Technology (IS&T)
Cell.:=C2=A0+33 7=C2=A052 66 20 81=C2=A0/=C2=A0= Direct:=C2=A0+33= 1 85 57 71 08
Bureau 0144C (Ouest)
30, rue Madeleine-Vion= net=C2=A0-=C2=A093300 Aubervilliers, France
=
=C2=A0 =C2=A0=C2=A0 =C2=A0=C2=A0 =C2=A0


----------------= -----------------------------------------------------------------= -----------
This e-mail tra= nsmission (message and any attached files) may contain information that is = proprietary, privileged and/or confidential to Veolia Environnement and/or = its affiliates and is intended exclusively for the person(s) to whom it is = addressed. If you are not the intended recipient, please notify the sender = by return e-mail and delete all copies of this e-mail, including all attach= ments. Unless expressly authorized, any use, disclosure, publication, retra= nsmission or dissemination of this e-mail and/or of its attachments is stri= ctly prohibited.=C2=A0
Ce message electronique = et ses fichiers attaches sont strictement confidentiels et peuvent contenir= des elements dont Veolia Environnement et/ou l'une de ses entites affi= liees sont proprietaires. Ils sont donc destines a l'usage de leurs seu= ls destinataires. Si vous avez recu ce message par erreur, merci de le reto= urner a son emetteur et de le detruire ainsi que toutes les pieces attachee= s. L'utilisation, la divulgation, la publication, la distribution, ou l= a reproduction non expressement autorisees de ce message et de ses pieces a= ttachees sont interdites.
= -----------------------------------------------------------------= ---------------------------



----------------------------= ----------------------------------------------------------------
This e-mail transmission (m= essage and any attached files) may contain information that is proprietary,= privileged and/or confidential to Veolia Environnement and/or its affiliat= es and is intended exclusively for the person(s) to whom it is addressed. I= f you are not the intended recipient, please notify the sender by return e-= mail and delete all copies of this e-mail, including all attachments. Unles= s expressly authorized, any use, disclosure, publication, retransmission or= dissemination of this e-mail and/or of its attachments is strictly prohibi= ted.=C2=A0

Ce message electronique et ses fichi= ers attaches sont strictement confidentiels et peuvent contenir des element= s dont Veolia Environnement et/ou l'une de ses entites affiliees sont p= roprietaires. Ils sont donc destines a l'usage de leurs seuls destinata= ires. Si vous avez recu ce message par erreur, merci de le retourner a son = emetteur et de le detruire ainsi que toutes les pieces attachees. L'uti= lisation, la divulgation, la publication, la distribution, ou la reproducti= on non expressement autorisees de ce message et de ses pieces attachees son= t interdites.
------------= -----------------------------------------------------------------= ---------------
--94eb2c09162289040e0551ab4b0e--