From dev-return-18730-archive-asf-public=cust-asf.ponee.io@manifoldcf.apache.org Mon Oct 29 18:18:24 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2EC3E180627 for ; Mon, 29 Oct 2018 18:18:24 +0100 (CET) Received: (qmail 70085 invoked by uid 500); 29 Oct 2018 17:18:23 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 70073 invoked by uid 99); 29 Oct 2018 17:18:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2018 17:18:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3D5D0190831 for ; Mon, 29 Oct 2018 17:18:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.888 X-Spam-Level: * X-Spam-Status: No, score=1.888 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 8JfhUSruOEzZ for ; Mon, 29 Oct 2018 17:18:19 +0000 (UTC) Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C1AE45F42F for ; Mon, 29 Oct 2018 17:18:18 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id g26-v6so7321156lja.10 for ; Mon, 29 Oct 2018 10:18:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=xOh98paHTTptGf5ULul7exwfbIWKIu6QqFtoEsTtOso=; b=INuj7YUjJdk9Ct7vntHtS4o1g0u+k45GIVCU6n9kxmSdwpqhQodTlsHU12adJgCKLP MDAoQMxWlDJn4KwzhLIft9i9AnMAVd3yoPhh32ABGz5FycNRUdI4HJVP344jAsEPOl2u IeQY1/+jLum8CgyoKlsQIYb/m/BVV7kudbppHhHmol1SlIgZ//KZNbAvwBPqq/nzcaVn Hmib3hh+qtxCvPorP78hdCUTtlWc5eOlCCnt1sJV+JGU3DfuO/x2ZhAq4qyZqjfo1tAV duJEwl2yqRnb5onkIrEdMhls+7zUVVvG8iLjg3Cbi7rrO5VZHo7Fiy4x6ICgsRmKdTJk SvHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=xOh98paHTTptGf5ULul7exwfbIWKIu6QqFtoEsTtOso=; b=YjFdSBQUTXtuZjpePY3I1Zxk79aG/Rw+j2+8EmCeXcpMCjOiTJHS/Dcv6J0QcjlwN4 kOA0NHss+N6JjedFa6BbVZKBumxYM9wV0AW9Jc4lH266GPUWt4k0wowhMoxmBYv14/Wx 4ia0Zf8Kojm14wn6oRaXr2sfWgCgvJNF2zW9jaeAFvTwgpvUSx3eOlj82nV5bW52OVOp R5C7gI1Hpjyy0WhPSQdkLv58TazavtCUBlrmVTi/kng5U2qyot8ULzMx9BvOJVPfbRxh dA9d8EHX2kaY1sJ9vpFm8rtmv0bdGnqbnc0y6G+RSQWp8FPZXBxe11RzLNpEmhOWo3ch lLLQ== X-Gm-Message-State: AGRZ1gLkO7MwCgxAZOrOtL6P6GkJ8abw0U874b2yE6lMvhVFeDOupzgB Q5oJu71J8B9LuHfX+Dfbx9unuybzZDUIAnPTBhpzRw== X-Google-Smtp-Source: AJdET5f6jaVPmI/DkaHvwWIRxdYV1VqEVCApsdpcp5tdTmxWMb0/vjoRjEwJJZYm0qdXyaP7LF4rwVWKNGFnmJj26Xk= X-Received: by 2002:a2e:5555:: with SMTP id j82-v6mr11119494ljb.69.1540833492051; Mon, 29 Oct 2018 10:18:12 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Karl Wright Date: Mon, 29 Oct 2018 13:18:00 -0400 Message-ID: Subject: Re: ManifoldCF database model To: dev Content-Type: multipart/alternative; boundary="000000000000f3b2e20579614088" --000000000000f3b2e20579614088 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable You can enable repository connector debug logging by adding this to your properties.xml: Having said that, the cleanup phase for all connectors is executed by the framework. We know the framework works because we have numerous integration tests that exercise it. But it's up to the ES connector to delete documents and log the fact that it is deleting documents. So I suspect that it is the ES connector's delete functionality that is not working properly. If you told me that *no* documents show up in the Simple History as being deleted during the cleanup phase, then there would obviously be a simple ES connector bug involved. But if there are multiple documents that *do* get deleted, it's more complex than that. Do you ever see *any* documents deleted during the cleanup phase in the Simple History with the ES connector? Another easy check is to set up exactly the same job but with the output going to the Null Output Connector. This connector definitely logs everything it sees. Compare and contrast vs the ES output connector. If you see a difference, it's likely a bug in the ES connector that we'll have to figure out. Thanks, Karl Karl On Mon, Oct 29, 2018 at 12:39 PM Gustavo Beneitez < gustavo.beneitez@gmail.com> wrote: > Hi, > > we made a new test, job created several documents that never where remove= d > from Elastic Search after job deletion, and the Simple History never show= ed > them as deleted. > > I also looked for an error on logs without luck. > > I think it could be 2) case, can I increase log detail for web repository= ? > This, and the Elastic, are both default connectors, no code changes here. > > Thanks. > > El lun., 29 oct. 2018 a las 16:12, Karl Wright () > escribi=C3=B3: > > > It is only possible if: > > > > (1) You run a job in a "minimal" configuration, or > > (2) There is a bug in either the repository connector that doesn't > properly > > signal the status of a deleted document to the pipeline, or > > (3) There is a bug in the output connector so that deletion of a docume= nt > > silently fails but is nevertheless reported as having succeeded. > > > > The way to figure this out is to look at the Simple History for one of > the > > documents you expect to have been deleted to see how it was handled. > > > > Thanks, > > Karl > > > > > > On Mon, Oct 29, 2018 at 11:06 AM Gustavo Beneitez < > > gustavo.beneitez@gmail.com> wrote: > > > > > Hi Karl, > > > > > > after several tests I did manage to create, run and delete a job with > > > Elastic output connector, and all its documents where also deleted fr= om > > > database while they were not deleted from repository. > > > > > > Under which cases is this possible? Maybe if they share repo? > > > > > > Thanks in advance! > > > > > > > > > El mi=C3=A9., 17 oct. 2018 a las 14:40, Gustavo Beneitez (< > > > gustavo.beneitez@gmail.com>) escribi=C3=B3: > > > > > > > Ok thanks! > > > > > > > > El mi=C3=A9., 17 oct. 2018 a las 14:27, Karl Wright ( >) > > > > escribi=C3=B3: > > > > > > > >> Ok, the schema is described in ManifoldCF In Action. > > > >> > > > >> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs > > > >> > > > >> Karl > > > >> > > > >> > > > >> On Wed, Oct 17, 2018 at 7:41 AM Gustavo Beneitez < > > > >> gustavo.beneitez@gmail.com> > > > >> wrote: > > > >> > > > >> > Hi Karl, > > > >> > > > > >> > as far as I was able to gather information from history records,= I > > > could > > > >> > see MCF is behaving as expected. The "problem" shows when > > > ElasticSearch > > > >> is > > > >> > down or performing bad, MCF says it was requested to be deleted, > but > > > >> while > > > >> > it has been erased from database, it is alive on ElasticSearch > side, > > > so > > > >> I > > > >> > need to find whether or not there are those kind of > inconsistencies > > > >> exist. > > > >> > > > > >> > Please allow us to check those documents and make new tests in > order > > > to > > > >> see > > > >> > what really happens,we don't modify any database record by hand. > > > >> > > > > >> > Thanks! > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > El mar., 16 oct. 2018 a las 19:27, Karl Wright (< > daddywri@gmail.com > > >) > > > >> > escribi=C3=B3: > > > >> > > > > >> > > Hi, you can look at ManifoldCF In Action. There's a link to i= t > on > > > the > > > >> > > manifoldcf page. > > > >> > > > > > >> > > However, you should be aware that we consider it a severe bug = if > > > >> > ManifoldCF > > > >> > > doesn't clean up after itself. The only time that is not > expected > > > is > > > >> > when > > > >> > > people write buggy connectors or mess with database tables > > > >> themselves. I > > > >> > > would urge you to examine the Simple History report and try to > > come > > > up > > > >> > with > > > >> > > a reproducible test case rather than trying to reverse enginee= r > > MCF. > > > >> > > Should you go directly to the database, we will be unable to > give > > > you > > > >> any > > > >> > > support. > > > >> > > > > > >> > > Thanks, > > > >> > > Karl > > > >> > > > > > >> > > > > > >> > > On Tue, Oct 16, 2018 at 11:51 AM Gustavo Beneitez < > > > >> > > gustavo.beneitez@gmail.com> wrote: > > > >> > > > > > >> > > > Hi all, > > > >> > > > > > > >> > > > how do you do? I was wandering if there is any technical > > document > > > >> about > > > >> > > > what is the meaning of each table in database, the > relationship > > > >> between > > > >> > > > documents, repositories, jobs and any other output connector > > (some > > > >> kind > > > >> > > of > > > >> > > > a database model). > > > >> > > > > > > >> > > > We are facing some "garbage issues", jobs are created, > > duplicated, > > > >> > > related > > > >> > > > to transformations, linked to outputs (Elastic Search), play= ed > > and > > > >> > > finally > > > >> > > > deleted, but in the end documents that should be also delete= d > > > >> against > > > >> > the > > > >> > > > output connector, sometimes they still are there, don't kno= w > if > > > >> they > > > >> > are > > > >> > > > visible because they point to an existing job, an unexpected > job > > > >> end or > > > >> > > any > > > >> > > > other failure. > > > >> > > > > > > >> > > > We need to understand the database model in order to check > when > > > >> > documents > > > >> > > > stored in Elastic can be safely removed since they no longer > are > > > >> > referred > > > >> > > > by any process. A process that should be executed periodical= ly > > > every > > > >> > > week, > > > >> > > > for example. > > > >> > > > > > > >> > > > Thanks in advance! > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > --000000000000f3b2e20579614088--