cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gritsenko <vadim.gritse...@verizon.net>
Subject Re: still trying to create a searchable lucene index of my xml docments
Date Tue, 03 Sep 2002 13:38:52 GMT
Harry J. Foxwell wrote:

>I have a collection of xml documents in http://cs2.gmu.edu:8080/cocoon/Marchive,
>a.xml, b.xml, etc.
>
>I have an AE.xsl stylesheet in the Marchive directory that successfully transforms
>my documents into html when I browse to
>http://cs2.gmu.edu:8080/cocoon/Marchive/a, for example..  So far, so good.
>
>My pipeline for Marchive is simple:
>
>	...
>	<map:pipeline>
>	<map:match pattern="Marchive/*">
>	<map:generate src="Marchive/{1}.xml"/>
>	<map:transform src="Marchive/AE.xsl"/>
>	<map:serialize/>
>	</map:match>
>	</map:pipeline>
>	...
>
>Next, I want to create a searchable lucene index of Marchive, so that when
>I search for an element content, I get a list of
>
>	http://cs2.gmu.edu:8080/cocoon/Marchive/a
>	http://cs2.gmu.edu:8080/cocoon/Marchive/d
>	etc
>
>and when I select one of these I get want the transformed-to-html displayed.
>Simple, I thought, but I can't modify the search example to generate
>my index.  This SHOULD be easy, but I can't figure out the required
>pipeline or the format of the xml file containing links to my files.
>  
>

1. You must have starting URL, so crawler can start with it ang go 
through all your documents.
2. content view should give meaningful results. This is what will be 
indexed.
3. Links view should give list of links. Otherwise crawler won't find 
where to go next from starting page.

Read more on views in the docs.

Vadim






---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail:   <cocoon-users-help@xml.apache.org>


Mime
View raw message