cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sal Mangano" <smang...@ureach.com>
Subject RE: Preventing session from timing out during search indexing
Date Fri, 06 Aug 2004 05:39:42 GMT
So as usual when I post a question to this list I end up finding the
solution myself, after much pain, ... sigh.

The solution is nice because it will solve the Lucene problem and the
problem of external robots, like google, trying to index the site.
Basically I treat the user-agent name of the robot as an user id. (This
allows me to control which robots index the site and what they see which has
other nice side effects that I won't get into here cause they are peculiar
to my situation). In any case, the robot is authenicated in the normal way
and information is stored in the session. I then arrange for all URL's that
the robot will crawl to be URL encodded with the session id. This is done
using a xslt transform on <a href="someurl"> elements. 


> -----Original Message-----
> From: Sal Mangano [mailto:smangano@ureach.com] 
> Sent: Thursday, August 05, 2004 4:59 PM
> To: users@cocoon.apache.org
> Subject: RE: Preventing session from timing out during search indexing
> 
> 
> Okay, further investigation shows that the session is not 
> timing out. It is that the indexer that is crawling the site 
> is not attached to the session. I still am not sure how to 
> fix but have some ideas. Would appreciate help just the same, PLEASE!
> 
> > -----Original Message-----
> > From: Sal Mangano [mailto:sal.mangano@into-technology.com]
> > Sent: Thursday, August 05, 2004 3:45 PM
> > To: users@cocoon.apache.org
> > Subject: Preventing session from timing out during search indexing
> > 
> > 
> > I am using Cocoon 2.1.5 and Tomcat 4.1.3
> > 
> > My site is constructed such that a user must be logged in to
> > access old content. A protected pipeline is set up using 
> > <map:match type="regexp-session" .../> to control access. 
> > This all works fine.
> > 
> > However, when it comes time to build my Lucene search index,
> > trouble begins. On my dev box the search index can take 1 
> > hour to build. Since the index involves gaining access to 
> > these protected pipelines the session must stay valid until 
> > the indexing is done. I use an xsp to kick off the search 
> > indexing and the relevant part looks like:
> > 
> >       <!--Make sure session does not expire before indexing
> > is finished -->
> >       <xsp-session:set-max-inactive-interval interval="-1"/>
> >       <xsp-session:set-attribute name="role">USER 
> > PUBLISHER</xsp-session:set-attribute>
> >       createIndex(baseURL, create );
> > 
> > As I tail the access.log I can see the index building process
> > is going along fine on its merry way for a time. The all of a 
> > sudden I see all accesses being redirected to a URL 
> > restricted.html which is exactly what will happen when there 
> > is no session or the session timed out.
> > 
> > Why is this not fixed by <xsp-session:set-max-inactive-interval
> > interval="-1"/> or <xsp-session:set-max-inactive-interval
> > interval="8000"/>?
> > 
> > Any hints or alternate strategies would be appreciated.
> > 
> > -Sal
> > 
> > ---------------------------------------------------------
> > Sal Mangano
> > Into Technology Inc.
> > www.into-technology.com
> > 
> > Use XSLT? Try the XSLT Cookbook
> > http://www.oreilly.com/catalog/xsltckbk/  
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> > For additional commands, e-mail: users-help@cocoon.apache.org
> > 
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message