manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: [Windows Shares Connector] Un-expected removal of all documents
Date Tue, 31 Mar 2015 14:50:03 GMT
Currently we are checking each of the String[] oldVersions , trying to
access it ...
So in the scenario I described the current performances are quite bad...
We would need to avoid at all the scan of the oldDocs if we know the
provided credential are not valid anymore .

Let me be extreme, but what about not allowing the job to start at all if
the Repository Connector is currently broken ( i.e. the connection is not
working, and we know that because of the check method) .
In this way we avoid to destroy already existent indexes and we simply
communicate a message in the job giving advice the job can not start
because Repository connector is currently offline ( and showing the
explanation) .

Does this make sense ?

2015-03-31 15:30 GMT+01:00 Karl Wright <daddywri@gmail.com>:

> Hi Alessandro,
>
> If you put a check in the processDocuments method, it will be called for
> every group of documents.  That's fine, but if you structure it as a
> separate call it would impact performance.  That is why I suggest just
> doing a better job of interpreting the existing exceptions.
>
> Karl
>
>
> On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
>
> > As an addition, this should be quite simple, not proceeding with the
> > processDocuments method, if the RepositoryConnector is not able to
> connect(
> > check method return not a proper message).
> >
> > Right ?
> > Wondering where is the proper point to enter the action :)
> >
> > Cheers
> >
> > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti <
> > benedetti.alex85@gmail.com>
> > :
> >
> > > Yes Karl,
> > >  I was thinking exactly that, to first check if the credentials are
> > valid,
> > > before scanning all the documents.
> > > This because permissions per files depend on users/groups, but the
> > current
> > > scenario is not in-validating the user, but invalidating the access of
> > that
> > > user.
> > >
> > > An error must be thrown, but the docs not deleted ( not even scanned) .
> > >
> > > Furthermore, what will happen, in the case the server is down ?
> > > Are we safe in that scenario ?
> > >
> > > Cheers
> > >
> > > 2015-03-31 14:42 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> > >
> > >> This is actually pretty standard behavior across our connector family,
> > and
> > >> has been true since Day One.  The behavior comes from the basic broad
> > >> requirement that the crawler should keep going and skip the document
> > when
> > >> the permissions do not allow it to be fetched.  With the Windows Share
> > >> connector, it's sometimes the case (when DFS is used a lot) that whole
> > >> subtrees of documents are not fetchable using the credentials
> supplied.
> > >> So
> > >> it is not so easy to just check for valid credentials at the
> beginning.
> > >>
> > >> For a solution, I'd be inclined to look for a way to figure out if the
> > >> credentials are actually *invalid*, and abort the job if so.  This is
> > >> distinct from the case where the credentials are valid but the
> connector
> > >> doesn't have permissions to read the document.  It will take some
> > >> experimentation to see if we get back different exception text in the
> > two
> > >> situations.
> > >>
> > >> Karl
> > >>
> > >>
> > >> On Tue, Mar 31, 2015 at 9:30 AM, Alessandro Benedetti <
> > >> abenedetti@apache.org
> > >> > wrote:
> > >>
> > >> > Hi guys,
> > >> > playing with the Windows Shares Connector in ManifoldCF 1.8 I
> > >> encountered
> > >> > this problem :
> > >> >
> > >> > *Scenario*
> > >> > *1)* Indexing windows Shares server
> > >> > *2)* Indexing successfully finished with N docs indexed
> > >> > *3)* Offline ,while no indexing is happening, Shares server side,
> the
> > >> > Administrator password changes
> > >> > *4) *Repository Connector is not able to connect anymore(of course
> > >> because
> > >> > the password has changed)
> > >> > *5)* Next indexing cycle, ALL docs are removed from the index .
> > >> >
> > >> > *Expected Behaviour*
> > >> > As I user I would like to see an error message, that will let me
> > >> understand
> > >> > the issue, not losing all my N indexed docs .
> > >> >
> > >> > *Reason*
> > >> > Taking a look into the code, the problems seems to be in the :
> > >> >
> > >> >
> > >>
> >
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> > >> > where it tries to access each document singularly through Samba, and
> > >> > removing them one by one if not reachable anymore.
> > >> >
> > >> > *Solution*
> > >> > Before scanning each document, we have to be sure the connection is
> > >> > working.
> > >> > If not this is only armful.
> > >> >
> > >> > I will continue investigating, but I would like your opinion as well
> > >> >
> > >> > Cheers
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > --------------------------
> > >> >
> > >> > Benedetti Alessandro
> > >> > Visiting card : http://about.me/alessandro_benedetti
> > >> >
> > >> > "Tyger, tyger burning bright
> > >> > In the forests of the night,
> > >> > What immortal hand or eye
> > >> > Could frame thy fearful symmetry?"
> > >> >
> > >> > William Blake - Songs of Experience -1794 England
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message