cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua.Schairb...@chase.com
Subject Re: Several Crawlers with different configurations / Lucene Index
Date Tue, 30 Aug 2005 14:00:34 GMT
Christoph,

I'm not sure about configuring a second indexer/crawler in the
cocoon.xconf, but I do know how to restrict the path.  When you configure
the crawler, add the element <exclude/> with the path to exclude in the
element.   For example, this would exclude all files within folders
entitled 'search':
<cocoon-crawler>
      <exclude>.*/search/.*</exclude>
</cocoon-crawler>
If you figure out how to configure the second indexer/crawler, I would be
very interested in finding out.  There are ways to restrict access to parts
of the index, but I am not familiar enough with them to help you.  There is
an excellent tool to help you with physically viewing what is in your
index.  I downloaded it from here:

http://www.getopt.org/luke/

Hopefully, this was helpful to you.

Regards,
Joshua


|---------+------------------------------->
|         |           Christoph Hermann   |
|         |           <christoph.hermann@g|
|         |           uschtel.de>         |
|         |                               |
|         |           08/30/2005 07:15 AM |
|         |           Please respond to   |
|         |           users               |
|         |                               |
|---------+------------------------------->
  >------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                      
                                       |
  |       To:       users@cocoon.apache.org                                              
                                       |
  |       cc:                                                                            
                                       |
  |       Subject:  Several Crawlers with different configurations / Lucene Index        
                                       |
  >------------------------------------------------------------------------------------------------------------------------------|




Hello,

i wanted to know if there is a way to configure different
indexer/crawler (in cocoon.xconf?) so that i.e. crawler one only crawls
urls under a certain directory i.e. http://www.example.com/foo/bar (the
crawler would NOT visit example.com/baz/boo) and crawler two crawls the
entire site (example.com).

In cocoon.xconf it seems there is only one possibility to specify
configuration options. I already modified the LuceneUtil java class to
permit me to create different indexes in whatever directory i want, but
i also need to have a possibility to restrict the crawling process a
little more.

I thought about specifying different views for different crawlers, but
as it seems i cannot specify two crawler.

Is there a way to do this?

With kind regards,
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Mime
View raw message