cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gritsenko <>
Subject Re: AW: How to create a mapped lucene index for linked-xml files ?
Date Fri, 23 Aug 2002 13:55:28 GMT
Florian Georg wrote:

>Maybe we can stem it together :)
>I brooded again some time over this question, so here are my thoughts.
>AFAIK Cocoon uses some crawler to do the indexing.


>After a look at the example xml, it seems to crawl through
><link href="foobar">Wombat</link> - Tags.

Not tags. Attributes: href, src, background. IIRC, xlink also supported.

>The index content is some mystically-hash-coded-compressed-index-metadata
>I suppose :)
>It is stored in the context-dir
>($TOMCAT_HOME/work/cocoon/localhost/cocoon-files/index to me)
>You don't need to know about Lucene's internals, I think (hope)...

Configurable. Default is <work-dir/>/index

>Concerning the question about the indexing and sitemap, I found a
>solution, that works for me :
>First, I define a map, which output's my plain xml - files :
>   <map:match pattern="xml/**">
>     <map:generate src="xml/{1}.xml"/>
>     <map:serialize type="xml"/>
>   </map:match>

Crawler uses links view, in your case it will be:
    <map:generate .../>
    <map:serialize type="links"/>

Indexer uses content view.

See views definition in <map:views/> section.

>(Due to the "xml/**" - pattern I can do relative links within my xml :
>  About <link href="about_us">us</link>
>--> links to "xml/about_us.xml")
>Next, I build the index with the sample indexer (crawler starting at
>BaseURL = http://localhost:8080/cocoon/mysite/xml/index
>Now, I install the Cocoon Search Generator :
><map:generators default="file" label="content">
>  <map:generator name="search"
>src="org.apache.cocoon.generation.SearchGenerator" label="content" />
><map:match pattern="search">
> <map:generate type="search" />
> <map:transform type="log" />
> <map:transform src="search2xhtml.xsl" />
> <map:serialize type="html"/>
>Finally (after building the index) I could search by using

Generator will search and return matched documents and their *URLs*. In 
your case it will be URLs like .../mysite/xml/...

>I don't know if this helps you, but I think it'll be o.k. for me, I think.
>  Florian
>-----Ursprungliche Nachricht-----
>Von: []
>Gesendet: Freitag, 23. August 2002 14:25
>Betreff: Re: How to create a mapped lucene index for linked-xml files ?
>The cocoon/lucene example works but is not clear (to me)
>as to how to modify for new purpose...a brief how-to would
>be very useful.  For example,
>	where do you place the files to be indexed (anywhere?) and
>	how do you point cocoon/lucene to them?

It does not index files. It indexes site.


>	where is the index created? what do the index contents
>	look like?
>	what specific sitemap changes must be made to locate/index/search
>	the files?
>I've looked at the lucene docs and at the cocoon example, and
>still haven't gotten it all clear...glad I'm not the only one!

Please check that your question  has not already been answered in the
FAQ before posting.     <>

To unsubscribe, e-mail:     <>
For additional commands, e-mail:   <>

View raw message