manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Beneitez <gustavo.benei...@gmail.com>
Subject Re: ManifoldCF database model
Date Mon, 29 Oct 2018 16:38:57 GMT
Hi,

we made a new test, job created several documents that never where removed
from Elastic Search after job deletion, and the Simple History never showed
them as deleted.

I also looked for an error on logs without luck.

I think it could be 2) case, can I increase log detail for web repository?
This, and the Elastic, are both default connectors, no code changes here.

Thanks.

El lun., 29 oct. 2018 a las 16:12, Karl Wright (<daddywri@gmail.com>)
escribió:

> It is only possible if:
>
> (1) You run a job in a "minimal" configuration, or
> (2) There is a bug in either the repository connector that doesn't properly
> signal the status of a deleted document to the pipeline, or
> (3) There is a bug in the output connector so that deletion of a document
> silently fails but is nevertheless reported as having succeeded.
>
> The way to figure this out is to look at the Simple History for one of the
> documents you expect to have been deleted to see how it was handled.
>
> Thanks,
> Karl
>
>
> On Mon, Oct 29, 2018 at 11:06 AM Gustavo Beneitez <
> gustavo.beneitez@gmail.com> wrote:
>
> > Hi Karl,
> >
> > after several tests I did manage to create, run and delete a job with
> > Elastic output connector, and all its documents where also deleted from
> > database while they were not deleted from repository.
> >
> > Under which cases is this possible? Maybe if they share repo?
> >
> > Thanks in advance!
> >
> >
> > El mié., 17 oct. 2018 a las 14:40, Gustavo Beneitez (<
> > gustavo.beneitez@gmail.com>) escribió:
> >
> > > Ok thanks!
> > >
> > > El mié., 17 oct. 2018 a las 14:27, Karl Wright (<daddywri@gmail.com>)
> > > escribió:
> > >
> > >> Ok, the schema is described in ManifoldCF In Action.
> > >>
> > >> https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs
> > >>
> > >> Karl
> > >>
> > >>
> > >> On Wed, Oct 17, 2018 at 7:41 AM Gustavo Beneitez <
> > >> gustavo.beneitez@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Karl,
> > >> >
> > >> > as far as I was able to gather information from history records, I
> > could
> > >> > see MCF is behaving as expected. The "problem" shows when
> > ElasticSearch
> > >> is
> > >> > down or performing bad, MCF says it was requested to be deleted, but
> > >> while
> > >> > it has been erased from database, it is alive on ElasticSearch side,
> > so
> > >> I
> > >> > need to find whether or not there are those kind of inconsistencies
> > >> exist.
> > >> >
> > >> > Please allow us to check those documents and make new tests in order
> > to
> > >> see
> > >> > what really happens,we don't modify any database record by hand.
> > >> >
> > >> > Thanks!
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > El mar., 16 oct. 2018 a las 19:27, Karl Wright (<daddywri@gmail.com
> >)
> > >> > escribió:
> > >> >
> > >> > > Hi, you can look at ManifoldCF In Action.  There's a link to
it on
> > the
> > >> > > manifoldcf page.
> > >> > >
> > >> > > However, you should be aware that we consider it a severe bug
if
> > >> > ManifoldCF
> > >> > > doesn't clean up after itself.  The only time that is not expected
> > is
> > >> > when
> > >> > > people write buggy connectors or mess with database tables
> > >> themselves.  I
> > >> > > would urge you to examine the Simple History report and try to
> come
> > up
> > >> > with
> > >> > > a reproducible test case rather than trying to reverse engineer
> MCF.
> > >> > > Should you go directly to the database, we will be unable to
give
> > you
> > >> any
> > >> > > support.
> > >> > >
> > >> > > Thanks,
> > >> > > Karl
> > >> > >
> > >> > >
> > >> > > On Tue, Oct 16, 2018 at 11:51 AM Gustavo Beneitez <
> > >> > > gustavo.beneitez@gmail.com> wrote:
> > >> > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > how do you do? I was wandering if there is any technical
> document
> > >> about
> > >> > > > what is the meaning of each table in database, the relationship
> > >> between
> > >> > > > documents, repositories, jobs and any other output connector
> (some
> > >> kind
> > >> > > of
> > >> > > > a database model).
> > >> > > >
> > >> > > > We are facing some "garbage issues", jobs are created,
> duplicated,
> > >> > > related
> > >> > > > to transformations, linked to outputs (Elastic Search),
played
> and
> > >> > > finally
> > >> > > > deleted, but in the end documents that should be also deleted
> > >> against
> > >> > the
> > >> > > > output connector,  sometimes they still are there, don't
know if
> > >> they
> > >> > are
> > >> > > > visible because they point to an existing job, an unexpected
job
> > >> end or
> > >> > > any
> > >> > > > other failure.
> > >> > > >
> > >> > > > We need to understand the database model in order to check
when
> > >> > documents
> > >> > > > stored in Elastic can be safely removed since they no longer
are
> > >> > referred
> > >> > > > by any process. A process that should be executed periodically
> > every
> > >> > > week,
> > >> > > > for example.
> > >> > > >
> > >> > > > Thanks in advance!
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message