manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Session Based access credentials in Web Connector - Redirect issues
Date Mon, 21 Sep 2015 23:48:16 GMT
Hi Deepa,

For a start, it sounds like you are making some fairly basic errors here.

For example, you should be thinking in terms of what pages exactly
constitute login pages and identifying those; if properly identified these
will NEVER be indexed.  If you are seeing pages that you think are part of
the login sequence being indexed, you cannot have properly specified the
login pages in some way.

Second, the cookies for a session-authenticated part of the site are
created by the site, not by you.  What you must do is walk the web
connector through the login sequence and allow it to set the cookies it
needs to set.  The Web Connector saves the cookies in effect aside at the
end of the login sequence.  So, trying to set them as a form parameter will
not generally work, unless by "cookie" you really mean some kind of
credential.

>From your description of the problem, it sounds like you will need the
following login pages:

- one to catch the redirection from xyz.com to the login page.  The login
page type would be "redirection" and the target regexp would be something
like "https://xyzLogin.com/opensso/UI/Login?realm=/xyz".
- one to fill in form values on the first login page
- one to fill in form values on the second login page
- one to describe the redirection back to wherever you started from

When you look in the simple history you should see BEGIN LOGIN happen with
the first redirection, and it should walk all the way back to your original
page, logging END LOGIN just before it fetches that.  If the fetch fails
because the login didn't work for some reason, then often you'll see the
cycle repeat.  If you never even get into the logging sequence, you may
need to try things out with browser http logging so you can see what's
actually happening.

Karl




On Mon, Sep 21, 2015 at 7:27 PM, Deepa Thakur <dipathakur@yahoo.com> wrote:

> Hello,
> I am trying to configure ManicoldCF 2.1 Web Connector to crawl my intranet
> site and it uses OpenAM for authorization. The sequence of steps involved
> to get to intranet home page is:
>
> 1.      Client requests http://xyz.com/ <http://foo.com/>
> 2.      If no valid authorization token is presented, client is
> redirected to
> https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com/
> 3.      Client submits two login forms (frm1 and frm2) and expects a
> valid userid and password.
> 4.      Client is given 3 cookies and JSESSIONID and then gets redirected
> back to http://xyz.com/
>
>
>
> In the Access Credentials tab I am defining the following login sequence:
>
> URL Regular Expression = xyz.com
>
> *Step 1: *
> Login URL Regular expression =
> https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com/
> <https://foologin.com/opensso/UI/Login?realm=/foo&goto=http://foo.com/>
> Page type = form
> Identification regular expression = frm1
>
> In the Override form parameters section:
> Parameter regular expression = IDToken1               Value=solruser
>
> *Step 2:*
> Login URL Regular expression = frm1
> Page type = form
> Identification regular expression = frm2
>
> In the Override form parameters section:
> Parameter regular expression = IDToken2               Value=<password>
>
> *Step 3:*
> Login URL Regular expression = frm2
> Page type = form
> Identification regular expression = post
>
> In the Override form parameters section:
> Parameter regular expression = Cookie1                 Value= <some value>
> Parameter regular expression = Cookie2                 Value= <some value>
> Parameter regular expression = Cookie3                 Value= <some value>
> Parameter regular expression = JSESSIONID           Value= <some value>
>
> *Step 4:*
> Login URL Regular expression =
> https://xyzLogin.com/opensso/UI/Login?realm=/xyz&goto=http://xyz.com
> <https://foologin.com/opensso/UI/Login?realm=/foo&goto=http://foo.com>
> Page type = form
>
> When I run the job, with seed URL http://xyz.com <http://foo.com/> ,
> contents of login page (https://xyzLogin.com/opensso/UI/Login
> <https://foologin.com/opensso/UI/Login> ) are indexed into Solr. In
> Simple History, I see Result code = RESPONSECODENOTINDEXABLE when it
> processes identifier http://xyz.com <http://foo.com/>.
>
> Can you please tell me how I can fix the login sequence so that, after the
> cookies are set, connector knows to redirect to seed URL? Also how do I
> prevent the login page from getting indexed into Solr.
>
> Thanks,
> D.T
>
>
>
>
>
>
>
>
>

Mime
View raw message