manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: [Windows Shares Connector] Un-expected removal of all documents
Date Tue, 31 Mar 2015 13:59:43 GMT
Yes Karl,
 I was thinking exactly that, to first check if the credentials are valid,
before scanning all the documents.
This because permissions per files depend on users/groups, but the current
scenario is not in-validating the user, but invalidating the access of that
user.

An error must be thrown, but the docs not deleted ( not even scanned) .

Furthermore, what will happen, in the case the server is down ?
Are we safe in that scenario ?

Cheers

2015-03-31 14:42 GMT+01:00 Karl Wright <daddywri@gmail.com>:

> This is actually pretty standard behavior across our connector family, and
> has been true since Day One.  The behavior comes from the basic broad
> requirement that the crawler should keep going and skip the document when
> the permissions do not allow it to be fetched.  With the Windows Share
> connector, it's sometimes the case (when DFS is used a lot) that whole
> subtrees of documents are not fetchable using the credentials supplied.  So
> it is not so easy to just check for valid credentials at the beginning.
>
> For a solution, I'd be inclined to look for a way to figure out if the
> credentials are actually *invalid*, and abort the job if so.  This is
> distinct from the case where the credentials are valid but the connector
> doesn't have permissions to read the document.  It will take some
> experimentation to see if we get back different exception text in the two
> situations.
>
> Karl
>
>
> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti <
> abenedetti@apache.org
> > wrote:
>
> > Hi guys,
> > playing with the Windows Shares Connector in ManifoldCF 1.8 I encountered
> > this problem :
> >
> > *Scenario*
> > *1)* Indexing windows Shares server
> > *2)* Indexing successfully finished with N docs indexed
> > *3)* Offline ,while no indexing is happening, Shares server side, the
> > Administrator password changes
> > *4) *Repository Connector is not able to connect anymore(of course
> because
> > the password has changed)
> > *5)* Next indexing cycle, ALL docs are removed from the index .
> >
> > *Expected Behaviour*
> > As I user I would like to see an error message, that will let me
> understand
> > the issue, not losing all my N indexed docs .
> >
> > *Reason*
> > Taking a look into the code, the problems seems to be in the :
> >
> >
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> > where it tries to access each document singularly through Samba, and
> > removing them one by one if not reachable anymore.
> >
> > *Solution*
> > Before scanning each document, we have to be sure the connection is
> > working.
> > If not this is only armful.
> >
> > I will continue investigating, but I would like your opinion as well
> >
> > Cheers
> >
> >
> >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message