Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4A173200B98 for ; Mon, 3 Oct 2016 16:57:42 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4894C160ADC; Mon, 3 Oct 2016 14:57:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E32A3160ACC for ; Mon, 3 Oct 2016 16:57:40 +0200 (CEST) Received: (qmail 5678 invoked by uid 500); 3 Oct 2016 14:57:40 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 5668 invoked by uid 99); 3 Oct 2016 14:57:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Oct 2016 14:57:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AA77D180516 for ; Mon, 3 Oct 2016 14:57:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.898 X-Spam-Level: * X-Spam-Status: No, score=1.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id o-9kcRaJg-0e for ; Mon, 3 Oct 2016 14:57:37 +0000 (UTC) Received: from mail-io0-f172.google.com (mail-io0-f172.google.com [209.85.223.172]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 5FFE35FB16 for ; Mon, 3 Oct 2016 14:57:37 +0000 (UTC) Received: by mail-io0-f172.google.com with SMTP id i202so42604984ioi.2 for ; Mon, 03 Oct 2016 07:57:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Y/2hpOfeO8ewHVNGYTVpd+ntqhpf2FCGfq4tp2sRBo8=; b=M7TkFMJwtR6kkvXml0WFT6iQEYN7Kb5oZtaY8OAt4wpM9OCe0IUX4osZBXAP50/fUh FgXe1mGWTxVIKmvNqDqacCmSpKtw6jPaBHxYP/BhRU3RV0f80XP2K7h9k9LG6G0uV6dw RIdtdjohrf2HWNOWir4l8TRxzPSy59bLR5jt0nwar+PVl9h8rT5SnxKfKfAmedADHTr+ WnEz5BzXYn6DAU5MS23uB5G/QhN4929iMhtAh0apyihBWpxrt3DbYghNrdU5Oo3plxKQ Wa9J4cO8xd9+u8HVJrP452dtLwTRi2bx/Z0g4Z5PMrlgNWb7GW4ZkcAhP57+bOliTSWu u0Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Y/2hpOfeO8ewHVNGYTVpd+ntqhpf2FCGfq4tp2sRBo8=; b=GBd5E1Cj7lMwv7AiegoGlnCL92qcc+gp3FNzKuwGYDgjvs/tDaLGU1vBUtATy/LmER P0v4Arxs/kcUtbhkgE/MYeLgBwCfmzA3+wvIu8dI4gczLElRmcDmJAGeIB0WIx7EbYio JxF6FJv2z1dmdaUdZI/IHFXE9y7o7qQYfmAevEr8cnStduShJcSGX1FuQNKeKVic7DxF D6xmMKg1SmRAOhhExI4lc69ZGSemylIkinQFYZ5a3J8ztEUxIFjPbDgES4piVgl6zb3w eUgyQ+GK1PxLi68hKafTTQ4S4/LmfCG20nK6TBXQ3jEuc6d7M99y1cbKN6odNl0KXTOd wTgQ== X-Gm-Message-State: AA6/9RnJfNZCuSzVbAWaTMEPysYKHTqIdRc93Adz4ErNIJea2VamPYsAJ3iWm3E4md2NaJ9Y85lhqqcOOUAyMQ== X-Received: by 10.107.130.218 with SMTP id m87mr717363ioi.17.1475506650497; Mon, 03 Oct 2016 07:57:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.6.163 with HTTP; Mon, 3 Oct 2016 07:57:29 -0700 (PDT) In-Reply-To: References: From: Karl Wright Date: Mon, 3 Oct 2016 10:57:29 -0400 Message-ID: Subject: Re: Custom Transfo Connector - Strange behaviour To: "user@manifoldcf.apache.org" Content-Type: multipart/alternative; boundary=001a113f7ebac44752053df729ff archived-at: Mon, 03 Oct 2016 14:57:42 -0000 --001a113f7ebac44752053df729ff Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Marc, Sounds like you are running into the incremental nature of the platform. The framework keeps track of a "version string" for each document from each connector involved in the pipeline. If the version string differs, then the framework knows that it must continue pushing the document down the pipeline. If not, then the framework may conclude that it is unnecessary to continue. I would look at how other similar transformation connectors handle the version string that they return. I suspect that your code may be missing a subtlety there. You can also confirm this picture by going to the output connection's view page and clicking the appropriate "forget" button and running the job again. If you see ingestions, you will know that you have connector problems that prevent MCF from doing its incremental logic properly. Please let me know what you find. Thanks, Karl On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery wrote: > Hi, > > > > First of all, thanks for this amazing framework ! > > > > I=E2=80=99m running a 2.4 Command-driven multi-process manifoldcf, with a= custom > transformation connector deployed in /connector-lib. > > Once registered, I add the connector in first place after a web connector= . > Everything runs fine the first time, > > > > 10-03-2016 14:35:01.171 > > document ingest (Solr) > > https://library.... > > OK > > 0 > > 11 > > 10-03-2016 14:35:01.162 > > extract [transfo tika] > > https://library... > > OK > > 0 > > 3 > > 10-03-2016 14:35:01.151 > > enhance [transfo biblio] > > https://library... > > ACCEPTED > > 0 > > 35 > > 10-03-2016 14:35:01.150 > > process > > https://library.... > > OK > > 12815 > > 38 > > 10-03-2016 14:35:00.009 > > fetch > > https://library... > > 200 > > 12815 > > 1136 > > > > > > but on subsequent run, each url ingestion stops after a successful fetch, > without reaching downstream connectors. > > > > 10-03-2016 16:06:01.085 > > fetch > > https://library... > > 200 > > 13992 > > 1250 > > 10-03-2016 16:05:56.084 > > fetch > > https://library... > > 200 > > 15505 > > 1090 > > 10-03-2016 16:05:51.084 > > fetch > > https://library... > > 200 > > 12876 > > 922 > > > > > > I can=E2=80=99t see any errors in the logs. > > > > How could I debug this ? Thanks for your help. > > > > > > Regards > > marc > --001a113f7ebac44752053df729ff Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Marc,

Sounds like you are running in= to the incremental nature of the platform.

The framework= keeps track of a "version string" for each document from each co= nnector involved in the pipeline.=C2=A0 If the version string differs, then= the framework knows that it must continue pushing the document down the pi= peline.=C2=A0 If not, then the framework may conclude that it is unnecessar= y to continue.

I would look at how other= similar transformation connectors handle the version string that they retu= rn.=C2=A0 I suspect that your code may be missing a subtlety there.=C2=A0 Y= ou can also confirm this picture by going to the output connection's vi= ew page and clicking the appropriate "forget" button and running = the job again. If you see ingestions, you will know that you have connector= problems that prevent MCF from doing its incremental logic properly.
=

Please let me know what you find.

<= div>Thanks,
Karl

On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery <marc.emery@valtech.com> wrote:

Hi,<= /p>

=C2=A0

First of a= ll, thanks for this amazing framework=C2=A0!

=C2= =A0

I=E2=80=99= m running a 2.4 Command-driven multi-process manifoldcf, with a custom tran= sformation connector deployed in /connector-lib.

Once regis= tered, I add the connector in first place after a web connector. Everything= runs fine the first time,

=C2= =A0

10-03-2016 14:35:01.171=

document ingest (Solr)<= /u>

https://library....

OK

0

11

10-03-2016 14:35:01.162=

extract [transfo tika]<= /u>

https://library...

OK

0

3

10-03-2016 14:35:01.151=

enhance [transfo biblio]

https://library...

ACCEPTED

0

35

10-03-2016 14:35:01.150=

process

https://library....

OK

12815=

38

10-03-2016 14:35:00.009=

fetch=

https://library...

200

12815=

1136<= /p>

=C2=A0

=C2= =A0

but on sub= sequent run, each url ingestion stops after a successful fetch, without rea= ching downstream connectors.

=C2= =A0

10-03-2016 16:06:01.085=

fetch=

https://library...

200

13992=

1250<= /p>

10-03-2016 16:05:56.084=

fetch=

https://library...

200

15505=

1090<= /p>

10-03-2016 16:05:51.084=

fetch=

https://library...

200

12876=

922

=C2=A0

=C2= =A0

I can=E2= =80=99t see any errors in the logs.

=C2= =A0

How could = I debug this ? Thanks for your help.

=C2= =A0

=C2= =A0

Regards

marc=


--001a113f7ebac44752053df729ff--