manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nichols, Richard" <Richard.Nich...@coriant.com>
Subject RE: [Windows Shares Connector] Un-expected removal of all documents
Date Tue, 07 Apr 2015 14:43:02 GMT
Will checking for "Logon failure: unknown user name or bad password." work for non-English
Windows installations?

Regards,
Rick

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Tuesday, April 07, 2015 8:52 AM
To: dev
Subject: Re: [Windows Shares Connector] Un-expected removal of all documents

Yes, this is exactly what I was thinking of.  You can go ahead and commit
this to trunk, and pull up the change to the dev_1x branch also.

Thanks!
Karl


On Tue, Apr 7, 2015 at 8:42 AM, Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> Hi Karl,
> just back to the issue, I think I solved it in a quick way ( not so much
> intrusive) :
>
>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
>
> org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java:706
>
> ...
>
> catch ( jcifs.smb.SmbAuthException e )
> {
>     Logging.connectors.warn(
>         "JCIFS: Authorization exception reading version information
> for " + documentIdentifier
>             + " - skipping" );
>     if(e.getMessage().equals("Logon failure: unknown user name or bad
> password."))
>         throw new ManifoldCFException( "SmbAuthException thrown: " +
> e.getMessage(), e );
>     else
>         rval[i] = null;
> }
>
> ...
>
> In this way the message is checked, and if it is a Login failure we
> throw the manifoldCFException breaking the iteration ( because login
> failure means no documents will be accessible but we don't have to
> erase them) .
>
> If it is another Authorization exception ( like permissions changed
> for the folder/file) the behaviour is the same than before.
>
> I think should be enough to be safe, what do you think ?
>
> Is any other method affected by this problem ?
>
> I think should be limited to the versions check.
>
>
> Cheers
>
>
> 2015-04-02 16:32 GMT+01:00 Alessandro Benedetti <
> benedetti.alex85@gmail.com>
> :
>
> >
> >
> > 2015-04-02 15:58 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> >
> >> Hi Alessandro,
> >>
> >> Yes, you interpreted my reply correctly.
> >>
> >> I think we therefore have to perform any checking operations on the
> actual
> >> file being accessed.  This is actually pretty easy to do without
> >> sacrificing performance.  All you need to do is the following:
> >>
> >> try {
> >>   ... do the file access operation ...
> >> } catch (SmbException e) {
> >>   ... figure out from the exception whether to throw a
> ManifoldCFException
> >> or a ServiceInterruption ...
> >>   ... If the exception does not include enough to distinguish between
> bad
> >> credentials and insufficient privs, then do a check RIGHT HERE for bad
> >> credentials ...
> >> }
> >>
> >> What do you think?  The new code would only ever be called if the
> document
> >> cannot be read.
> >>
> >
> > I think we can proceed like you said, I am investigating right now the
> > details returned for the exception ( to understand if there is any
> > difference between wrong credentials or access denied)
> > In the case we find the "wrong credential" we have to throw the exception
> > and stop the iteration ( this will happen the very first time assuming
> none
> > is playing server side) .
> > In this way we save the time of checking all the files ( in the case of
> > wrong credentials no one will be accessible) .
> >
> > Another way can be to do this credential check at the beginning and stop
> > only if we have wrong credential ( leaving the permission check file by
> > file) .
> >
> > Quite a confused scenario, but we can sort this out with little changes
> :)
> >
> >
> >
> >>
> >> Karl
> >>
> >>
> >> On Thu, Apr 2, 2015 at 10:42 AM, Alessandro Benedetti <
> >> benedetti.alex85@gmail.com> wrote:
> >>
> >> > OkI am currently working on that, and I will work on that next tuesday
> >> as
> >> > well .
> >> > But what about point 2 :
> >> > " (2) the check itself is
> >> > specific to the ROOT of the tree, which the user may not have access
> >> to."
> >> >
> >> > I think I got your problem, you mean that a possible scenario can
> happen
> >> > when you configure the repository connector with a user that  is not
> >> able
> >> > to access the root but is able to access the directories we want to
> >> crawl.
> >> > In such a case the repository connector will appear to be not able to
> >> > connect, while the crawling will be still possible if you configure
> the
> >> > accessible directories in the job.
> >> > If this is correct , the situation is more complicated ...
> >> >
> >> > Cheers
> >> >
> >> >
> >> > 2015-03-31 16:44 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> >> >
> >> > > Hi Alessandro,
> >> > >
> >> > > Your code snippet has two problems: (1) it doesn't distinguish
> between
> >> > > service interruptions and bad credentials,
> >> >
> >> >
> >> > Should not be the difference between the IOException and the Smb one ?
> >> >
> >> >
> >> > > and (2) the check itself is
> >> > > specific to the ROOT of the tree, which the user may not have access
> >> to.
> >> > >
> >> >
> >> >
> >> >
> >> > > In check() we can get away with this but if you wire up the check()
> >> logic
> >> > > into the crawl processing it will break some people.
> >> > >
> >> > > The first problem, (1), is exactly what we need to figure out
> anyway.
> >> > >
> >> > > Karl
> >> > >
> >> > >
> >> > > On Tue, Mar 31, 2015 at 11:30 AM, Alessandro Benedetti <
> >> > > benedetti.alex85@gmail.com> wrote:
> >> > >
> >> > > > Hi karl comments follow :
> >> > > >
> >> > > > 2015-03-31 16:18 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> >> > > >
> >> > > > > Hi Alessandro,
> >> > > > >
> >> > > > > There are situations where the check() method does not succeed
> but
> >> > you
> >> > > > can
> >> > > > > still crawl.  So I would not do it that way, since it
> >> fundamentally
> >> > > > changes
> >> > > > > the contract.
> >> > > > >
> >> > > >
> >> > > > Am I wrong or we should assume the "check()" method to work as
> it's
> >> > built
> >> > > > for.
> >> > > > I mean if in some case, this method is wrongly implemented ,
this
> >> can
> >> > not
> >> > > > break another assumption.
> >> > > >
> >> > > > >
> >> > > > > My proposal is to have processDocuments ABORT the job when
it
> >> finds
> >> > bad
> >> > > > > credentials.  That's very fast and will not permit a job
to run
> >> for a
> >> > > > long
> >> > > > > time.
> >> > > > >
> >> > > > > The trick is to determine if there are bad credentials WITHOUT
> >> doing
> >> > > any
> >> > > > > more work in the processDocuments pathway than we currently
are.
> >> An
> >> > > > > exception will be thrown either way, but we need to figure
out
> >> > whether
> >> > > > > there is any information in the exception that we can use
to
> >> decide
> >> > > > between
> >> > > > > bad credentials and no access permissions.
> >> > > > >
> >> > > > > You can help provide that by doing a simple experiment on
your
> >> > client's
> >> > > > > hardware (or yours, if you have such hardware in house).
 Change
> >> the
> >> > > > > credential to an invalid one and see what the exception
details
> >> are.
> >> > > > Then
> >> > > > > change to valid credentials and try to crawl a directory
that is
> >> not
> >> > > > > visible to the credentialed user you supplied, and make
a note
> of
> >> the
> >> > > > > exception details in that case too.
> >> > > > >
> >> > > >
> >> > > > I was thinking to slightly modifying the getSession() method
> adding
> >> the
> >> > > > file exist check , something like this :
> >> > > >
> >> > > > ...
> >> > > >
> >> > > > try
> >> > > > {
> >> > > >     // use NtlmPasswordAuthentication so that we can reuse
> >> credential
> >> > > > for DFS support
> >> > > >     pa = new NtlmPasswordAuthentication( domain, username,
> password
> >> );
> >> > > >     SmbFile smbconnection = new SmbFile( "smb://" + server +
"/",
> >> pa );
> >> > > >     smbconnectionPath = getFileCanonicalPath( smbconnection );
> >> > > >     smbconnection.exists();
> >> > > > }
> >> > > > catch ( MalformedURLException e )
> >> > > > {
> >> > > >     Logging.connectors.error(
> >> > > >         "Unable to access SMB/CIFS share: " + "smb://" + ( (
> domain
> >> ==
> >> > > > null ) ? "" : domain ) + ";"
> >> > > >             + username + ":<password>@" + server + "/\n"
+ e );
> >> > > >     throw new ManifoldCFException( "Unable to access SMB/CIFS
> >> share: "
> >> > > > + server, e,
> >> > > >
> >> > > > ManifoldCFException.REPOSITORY_CONNECTION_ERROR );
> >> > > > } catch (SmbException e) {
> >> > > >     Logging.connectors.error(
> >> > > >             "Unable to access SMB/CIFS share: Credential not
valid
> >> - "
> >> > > > + "smb://" + ((domain == null) ? "" : domain) + ";"
> >> > > >                     + username + ":<password>@" + server
+ "/\n" +
> >> e);
> >> > > >     throw new ManifoldCFException( "Unable to access SMB/CIFS
> share:
> >> > > > Credential not valid - " + server, e,
> >> > > >             ManifoldCFException.REPOSITORY_CONNECTION_ERROR );
> >> > > > }
> >> > > >
> >> > > > Catching the smbException should make the trick.
> >> > > > Anyway I will go more in details.
> >> > > >
> >> > > > Cheers
> >> > > >
> >> > > >
> >> > > > > Karl
> >> > > > >
> >> > > > >
> >> > > > > On Tue, Mar 31, 2015 at 10:50 AM, Alessandro Benedetti <
> >> > > > > benedetti.alex85@gmail.com> wrote:
> >> > > > >
> >> > > > > > Currently we are checking each of the String[] oldVersions
,
> >> trying
> >> > > to
> >> > > > > > access it ...
> >> > > > > > So in the scenario I described the current performances
are
> >> quite
> >> > > > bad...
> >> > > > > > We would need to avoid at all the scan of the oldDocs
if we
> know
> >> > the
> >> > > > > > provided credential are not valid anymore .
> >> > > > > >
> >> > > > > > Let me be extreme, but what about not allowing the
job to
> start
> >> at
> >> > > all
> >> > > > if
> >> > > > > > the Repository Connector is currently broken ( i.e.
the
> >> connection
> >> > is
> >> > > > not
> >> > > > > > working, and we know that because of the check method)
.
> >> > > > > > In this way we avoid to destroy already existent indexes
and
> we
> >> > > simply
> >> > > > > > communicate a message in the job giving advice the
job can not
> >> > start
> >> > > > > > because Repository connector is currently offline (
and
> showing
> >> the
> >> > > > > > explanation) .
> >> > > > > >
> >> > > > > > Does this make sense ?
> >> > > > > >
> >> > > > > > 2015-03-31 15:30 GMT+01:00 Karl Wright <daddywri@gmail.com>:
> >> > > > > >
> >> > > > > > > Hi Alessandro,
> >> > > > > > >
> >> > > > > > > If you put a check in the processDocuments method,
it will
> be
> >> > > called
> >> > > > > for
> >> > > > > > > every group of documents.  That's fine, but if
you structure
> >> it
> >> > as
> >> > > a
> >> > > > > > > separate call it would impact performance.  That
is why I
> >> suggest
> >> > > > just
> >> > > > > > > doing a better job of interpreting the existing
exceptions.
> >> > > > > > >
> >> > > > > > > Karl
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Tue, Mar 31, 2015 at 10:27 AM, Alessandro Benedetti
<
> >> > > > > > > benedetti.alex85@gmail.com> wrote:
> >> > > > > > >
> >> > > > > > > > As an addition, this should be quite simple,
not
> proceeding
> >> > with
> >> > > > the
> >> > > > > > > > processDocuments method, if the RepositoryConnector
is not
> >> able
> >> > > to
> >> > > > > > > connect(
> >> > > > > > > > check method return not a proper message).
> >> > > > > > > >
> >> > > > > > > > Right ?
> >> > > > > > > > Wondering where is the proper point to enter
the action :)
> >> > > > > > > >
> >> > > > > > > > Cheers
> >> > > > > > > >
> >> > > > > > > > 2015-03-31 14:59 GMT+01:00 Alessandro Benedetti
<
> >> > > > > > > > benedetti.alex85@gmail.com>
> >> > > > > > > > :
> >> > > > > > > >
> >> > > > > > > > > Yes Karl,
> >> > > > > > > > >  I was thinking exactly that, to first
check if the
> >> > credentials
> >> > > > are
> >> > > > > > > > valid,
> >> > > > > > > > > before scanning all the documents.
> >> > > > > > > > > This because permissions per files depend
on
> users/groups,
> >> > but
> >> > > > the
> >> > > > > > > > current
> >> > > > > > > > > scenario is not in-validating the user,
but invalidating
> >> the
> >> > > > access
> >> > > > > > of
> >> > > > > > > > that
> >> > > > > > > > > user.
> >> > > > > > > > >
> >> > > > > > > > > An error must be thrown, but the docs
not deleted ( not
> >> even
> >> > > > > > scanned) .
> >> > > > > > > > >
> >> > > > > > > > > Furthermore, what will happen, in the
case the server is
> >> > down ?
> >> > > > > > > > > Are we safe in that scenario ?
> >> > > > > > > > >
> >> > > > > > > > > Cheers
> >> > > > > > > > >
> >> > > > > > > > > 2015-03-31 14:42 GMT+01:00 Karl Wright
<
> >> daddywri@gmail.com>:
> >> > > > > > > > >
> >> > > > > > > > >> This is actually pretty standard
behavior across our
> >> > connector
> >> > > > > > family,
> >> > > > > > > > and
> >> > > > > > > > >> has been true since Day One.  The
behavior comes from
> the
> >> > > basic
> >> > > > > > broad
> >> > > > > > > > >> requirement that the crawler should
keep going and skip
> >> the
> >> > > > > document
> >> > > > > > > > when
> >> > > > > > > > >> the permissions do not allow it
to be fetched.  With
> the
> >> > > Windows
> >> > > > > > Share
> >> > > > > > > > >> connector, it's sometimes the case
(when DFS is used a
> >> lot)
> >> > > that
> >> > > > > > whole
> >> > > > > > > > >> subtrees of documents are not fetchable
using the
> >> > credentials
> >> > > > > > > supplied.
> >> > > > > > > > >> So
> >> > > > > > > > >> it is not so easy to just check
for valid credentials
> at
> >> the
> >> > > > > > > beginning.
> >> > > > > > > > >>
> >> > > > > > > > >> For a solution, I'd be inclined
to look for a way to
> >> figure
> >> > > out
> >> > > > if
> >> > > > > > the
> >> > > > > > > > >> credentials are actually *invalid*,
and abort the job
> if
> >> so.
> >> > > > This
> >> > > > > > is
> >> > > > > > > > >> distinct from the case where the
credentials are valid
> >> but
> >> > the
> >> > > > > > > connector
> >> > > > > > > > >> doesn't have permissions to read
the document.  It will
> >> take
> >> > > > some
> >> > > > > > > > >> experimentation to see if we get
back different
> exception
> >> > text
> >> > > > in
> >> > > > > > the
> >> > > > > > > > two
> >> > > > > > > > >> situations.
> >> > > > > > > > >>
> >> > > > > > > > >> Karl
> >> > > > > > > > >>
> >> > > > > > > > >>
> >> > > > > > > > >> On Tue, Mar 31, 2015 at 9:30 AM,
Alessandro Benedetti <
> >> > > > > > > > >> abenedetti@apache.org
> >> > > > > > > > >> > wrote:
> >> > > > > > > > >>
> >> > > > > > > > >> > Hi guys,
> >> > > > > > > > >> > playing with the Windows Shares
Connector in
> ManifoldCF
> >> > 1.8
> >> > > I
> >> > > > > > > > >> encountered
> >> > > > > > > > >> > this problem :
> >> > > > > > > > >> >
> >> > > > > > > > >> > *Scenario*
> >> > > > > > > > >> > *1)* Indexing windows Shares
server
> >> > > > > > > > >> > *2)* Indexing successfully
finished with N docs
> indexed
> >> > > > > > > > >> > *3)* Offline ,while no indexing
is happening, Shares
> >> > server
> >> > > > > side,
> >> > > > > > > the
> >> > > > > > > > >> > Administrator password changes
> >> > > > > > > > >> > *4) *Repository Connector is
not able to connect
> >> > anymore(of
> >> > > > > course
> >> > > > > > > > >> because
> >> > > > > > > > >> > the password has changed)
> >> > > > > > > > >> > *5)* Next indexing cycle, ALL
docs are removed from
> the
> >> > > index
> >> > > > .
> >> > > > > > > > >> >
> >> > > > > > > > >> > *Expected Behaviour*
> >> > > > > > > > >> > As I user I would like to see
an error message, that
> >> will
> >> > > let
> >> > > > me
> >> > > > > > > > >> understand
> >> > > > > > > > >> > the issue, not losing all my
N indexed docs .
> >> > > > > > > > >> >
> >> > > > > > > > >> > *Reason*
> >> > > > > > > > >> > Taking a look into the code,
the problems seems to be
> >> in
> >> > > the :
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector#getDocumentVersions
> >> > > > > > > > >> > where it tries to access each
document singularly
> >> through
> >> > > > Samba,
> >> > > > > > and
> >> > > > > > > > >> > removing them one by one if
not reachable anymore.
> >> > > > > > > > >> >
> >> > > > > > > > >> > *Solution*
> >> > > > > > > > >> > Before scanning each document,
we have to be sure the
> >> > > > connection
> >> > > > > > is
> >> > > > > > > > >> > working.
> >> > > > > > > > >> > If not this is only armful.
> >> > > > > > > > >> >
> >> > > > > > > > >> > I will continue investigating,
but I would like your
> >> > opinion
> >> > > > as
> >> > > > > > well
> >> > > > > > > > >> >
> >> > > > > > > > >> > Cheers
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> >
> >> > > > > > > > >> > --
> >> > > > > > > > >> > --------------------------
> >> > > > > > > > >> >
> >> > > > > > > > >> > Benedetti Alessandro
> >> > > > > > > > >> > Visiting card : http://about.me/alessandro_benedetti
> >> > > > > > > > >> >
> >> > > > > > > > >> > "Tyger, tyger burning bright
> >> > > > > > > > >> > In the forests of the night,
> >> > > > > > > > >> > What immortal hand or eye
> >> > > > > > > > >> > Could frame thy fearful symmetry?"
> >> > > > > > > > >> >
> >> > > > > > > > >> > William Blake - Songs of Experience
-1794 England
> >> > > > > > > > >> >
> >> > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > --------------------------
> >> > > > > > > > >
> >> > > > > > > > > Benedetti Alessandro
> >> > > > > > > > > Visiting card : http://about.me/alessandro_benedetti
> >> > > > > > > > >
> >> > > > > > > > > "Tyger, tyger burning bright
> >> > > > > > > > > In the forests of the night,
> >> > > > > > > > > What immortal hand or eye
> >> > > > > > > > > Could frame thy fearful symmetry?"
> >> > > > > > > > >
> >> > > > > > > > > William Blake - Songs of Experience
-1794 England
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > --------------------------
> >> > > > > > > >
> >> > > > > > > > Benedetti Alessandro
> >> > > > > > > > Visiting card : http://about.me/alessandro_benedetti
> >> > > > > > > >
> >> > > > > > > > "Tyger, tyger burning bright
> >> > > > > > > > In the forests of the night,
> >> > > > > > > > What immortal hand or eye
> >> > > > > > > > Could frame thy fearful symmetry?"
> >> > > > > > > >
> >> > > > > > > > William Blake - Songs of Experience -1794
England
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > --------------------------
> >> > > > > >
> >> > > > > > Benedetti Alessandro
> >> > > > > > Visiting card : http://about.me/alessandro_benedetti
> >> > > > > >
> >> > > > > > "Tyger, tyger burning bright
> >> > > > > > In the forests of the night,
> >> > > > > > What immortal hand or eye
> >> > > > > > Could frame thy fearful symmetry?"
> >> > > > > >
> >> > > > > > William Blake - Songs of Experience -1794 England
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > --------------------------
> >> > > >
> >> > > > Benedetti Alessandro
> >> > > > Visiting card : http://about.me/alessandro_benedetti
> >> > > >
> >> > > > "Tyger, tyger burning bright
> >> > > > In the forests of the night,
> >> > > > What immortal hand or eye
> >> > > > Could frame thy fearful symmetry?"
> >> > > >
> >> > > > William Blake - Songs of Experience -1794 England
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --------------------------
> >> >
> >> > Benedetti Alessandro
> >> > Visiting card : http://about.me/alessandro_benedetti
> >> >
> >> > "Tyger, tyger burning bright
> >> > In the forests of the night,
> >> > What immortal hand or eye
> >> > Could frame thy fearful symmetry?"
> >> >
> >> > William Blake - Songs of Experience -1794 England
> >> >
> >>
> >
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

============================================================
The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader
of this message is not the intended recipient, or an employee
or agent responsible for delivering this message to the
intended recipient, you are hereby notified that any reproduction,
dissemination or distribution of this communication is strictly
prohibited. If you have received this communication in error,
please notify us immediately by replying to the message and
deleting it from your computer. Thank you. Coriant-Tellabs
============================================================
Mime
View raw message