manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: [Windows Shares Connector] Un-expected removal of all documents
Date Tue, 31 Mar 2015 14:30:38 GMT
Hi Alessandro,

If you put a check in the processDocuments method, it will be called for
every group of documents.  That's fine, but if you structure it as a
separate call it would impact performance.  That is why I suggest just
doing a better job of interpreting the existing exceptions.

Karl


On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> As an addition, this should be quite simple, not proceeding with the
> processDocuments method, if the RepositoryConnector is not able to connect(
> check method return not a proper message).
>
> Right ?
> Wondering where is the proper point to enter the action :)
>
> Cheers
>
> 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti <
> benedetti.alex85@gmail.com>
> :
>
> > Yes Karl,
> >  I was thinking exactly that, to first check if the credentials are
> valid,
> > before scanning all the documents.
> > This because permissions per files depend on users/groups, but the
> current
> > scenario is not in-validating the user, but invalidating the access of
> that
> > user.
> >
> > An error must be thrown, but the docs not deleted ( not even scanned) .
> >
> > Furthermore, what will happen, in the case the server is down ?
> > Are we safe in that scenario ?
> >
> > Cheers
> >
> > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> >
> >> This is actually pretty standard behavior across our connector family,
> and
> >> has been true since Day One.  The behavior comes from the basic broad
> >> requirement that the crawler should keep going and skip the document
> when
> >> the permissions do not allow it to be fetched.  With the Windows Share
> >> connector, it's sometimes the case (when DFS is used a lot) that whole
> >> subtrees of documents are not fetchable using the credentials supplied.
> >> So
> >> it is not so easy to just check for valid credentials at the beginning.
> >>
> >> For a solution, I'd be inclined to look for a way to figure out if the
> >> credentials are actually *invalid*, and abort the job if so.  This is
> >> distinct from the case where the credentials are valid but the connector
> >> doesn't have permissions to read the document.  It will take some
> >> experimentation to see if we get back different exception text in the
> two
> >> situations.
> >>
> >> Karl
> >>
> >>
> >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti <
> >> abenedetti@apache.org
> >> > wrote:
> >>
> >> > Hi guys,
> >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I
> >> encountered
> >> > this problem :
> >> >
> >> > *Scenario*
> >> > *1)* Indexing windows Shares server
> >> > *2)* Indexing successfully finished with N docs indexed
> >> > *3)* Offline ,while no indexing is happening, Shares server side, the
> >> > Administrator password changes
> >> > *4) *Repository Connector is not able to connect anymore(of course
> >> because
> >> > the password has changed)
> >> > *5)* Next indexing cycle, ALL docs are removed from the index .
> >> >
> >> > *Expected Behaviour*
> >> > As I user I would like to see an error message, that will let me
> >> understand
> >> > the issue, not losing all my N indexed docs .
> >> >
> >> > *Reason*
> >> > Taking a look into the code, the problems seems to be in the :
> >> >
> >> >
> >>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> >> > where it tries to access each document singularly through Samba, and
> >> > removing them one by one if not reachable anymore.
> >> >
> >> > *Solution*
> >> > Before scanning each document, we have to be sure the connection is
> >> > working.
> >> > If not this is only armful.
> >> >
> >> > I will continue investigating, but I would like your opinion as well
> >> >
> >> > Cheers
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > --------------------------
> >> >
> >> > Benedetti Alessandro
> >> > Visiting card : http://about.me/alessandro_benedetti
> >> >
> >> > "Tyger, tyger burning bright
> >> > In the forests of the night,
> >> > What immortal hand or eye
> >> > Could frame thy fearful symmetry?"
> >> >
> >> > William Blake - Songs of Experience -1794 England
> >> >
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message