manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kooloos <mkool...@hotmail.com>
Subject RE: Web connector - Session-based access credentials
Date Thu, 30 Aug 2012 15:38:20 GMT

Karl,

My seed document is not a logon page, but the seed document url automatically redirects to
the logon pages. So the first regex is of the logon page, then the regex for the Login URL
is the same (since it's the logon page), type = Form. Do I define any redirect after the logon
form?

Hope this makes a bit of sence..

Didn't think it would be that hard to setup some access credentials..

> Date: Thu, 30 Aug 2012 10:03:20 -0400
> Subject: Re: Web connector - Session-based access credentials
> From: daddywri@gmail.com
> To: user@manifoldcf.apache.org
> 
> It sounds like your regular expression(s) which describe what pages
> belong to the logon sequence may be incorrect.  After the logon
> sequence exits, the crawler will attempt to refetch the page it was
> working on before it entered the logon sequence.  If that page is PART
> of the logon sequence it will loop as you describe.
> 
> Your seed documents should therefore NOT be logon pages or you will
> never get anywhere...
> 
> Karl
> 
> On Thu, Aug 30, 2012 at 9:58 AM, Michael Kooloos <mkooloos@hotmail.com> wrote:
> > Karl,
> >
> > I've read through the similar problems/questions on the list (only found 3),
> > but without any luck. In the Seed I've the page I want to crawl, but this on
> > protected by security, so I setup a redirect to the login-page and a form
> > for the login-page with the username/password parameters. When I look in the
> > Simple History I see the fetch of the first page, the begin-logon, redirect
> > to the login-page, the end-logon, but then it starts all over again and
> > keeps in a loop. Any ideas? I think a working example will help me a lot..
> >
> > Michael
> >
> >> Date: Thu, 30 Aug 2012 09:29:08 -0400
> >> Subject: Re: Web connector - Session-based access credentials
> >> From: daddywri@gmail.com
> >> To: user@manifoldcf.apache.org
> >
> >>
> >> I set it up to crawl Angie's List at one point. It was developed to
> >> crawl an oil-and-gas exploration subscription site. Others have
> >> fielded fairly detailed questions and/or problems to this list, so I
> >> know it has been used by many.
> >>
> >> Can you give a more thorough and detailed description of what your are
> >> trying to crawl, and what is happening for you?
> >>
> >> Karl
> >>
> >> On Thu, Aug 30, 2012 at 9:25 AM, Michael Kooloos <mkooloos@hotmail.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> > Does anyone have a working example of the session-based access
> >> > credentials
> >> > for the web connector? Following the end-user-documentation as good as
> >> > possible, but still no luck :(
> >> >
> >> > Thanks!
 		 	   		  
Mime
View raw message