cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gritsenko <vadim.gritse...@verizon.net>
Subject Re: AW: How to create a mapped lucene index for linked-xml files ?
Date Fri, 23 Aug 2002 13:55:28 GMT
Florian Georg wrote:

>Maybe we can stem it together :)
>
>I brooded again some time over this question, so here are my thoughts.
>
>AFAIK Cocoon uses some crawler to do the indexing.
>

Yes.


>After a look at the example xml, it seems to crawl through
><link href="foobar">Wombat</link> - Tags.
>

Not tags. Attributes: href, src, background. IIRC, xlink also supported.


>The index content is some mystically-hash-coded-compressed-index-metadata
>I suppose :)
>It is stored in the context-dir
>($TOMCAT_HOME/work/cocoon/localhost/cocoon-files/index to me)
>You don't need to know about Lucene's internals, I think (hope)...
>

Configurable. Default is <work-dir/>/index


>Concerning the question about the indexing and sitemap, I found a
>solution, that works for me :
>
>
>First, I define a map, which output's my plain xml - files :
>
>   <map:match pattern="xml/**">
>     <map:generate src="xml/{1}.xml"/>
>     <map:serialize type="xml"/>
>   </map:match>
>

Crawler uses links view, in your case it will be:
    <map:generate .../>
    <map:serialize type="links"/>

Indexer uses content view.

See views definition in <map:views/> section.



>(Due to the "xml/**" - pattern I can do relative links within my xml :
>  About <link href="about_us">us</link>
>--> links to "xml/about_us.xml")
>
>
>Next, I build the index with the sample indexer (crawler starting at
>index.xml)
>BaseURL = http://localhost:8080/cocoon/mysite/xml/index
>
>
>Now, I install the Cocoon Search Generator :
>
><map:generators default="file" label="content">
>  <map:generator name="search"
>src="org.apache.cocoon.generation.SearchGenerator" label="content" />
></map:generators>
>...
><map:match pattern="search">
> <map:generate type="search" />
> <map:transform type="log" />
> <map:transform src="search2xhtml.xsl" />
> <map:serialize type="html"/>
></map:match>
>
>
>Finally (after building the index) I could search by using
>"search?queryString=Baz"
>

Generator will search and return matched documents and their *URLs*. In 
your case it will be URLs like .../mysite/xml/...


>I don't know if this helps you, but I think it'll be o.k. for me, I think.
>
>greetings
>  Florian
>
>
>
>
>-----Ursprungliche Nachricht-----
>Von: hfoxwell@cs.gmu.edu [mailto:hfoxwell@cs.gmu.edu]
>Gesendet: Freitag, 23. August 2002 14:25
>An: cocoon-users@xml.apache.org
>Betreff: Re: How to create a mapped lucene index for linked-xml files ?
>
>
>
>The cocoon/lucene example works but is not clear (to me)
>as to how to modify for new purpose...a brief how-to would
>be very useful.  For example,
>
>	where do you place the files to be indexed (anywhere?) and
>	how do you point cocoon/lucene to them?
>

It does not index files. It indexes site.

Vadim


>	where is the index created? what do the index contents
>	look like?
>
>	what specific sitemap changes must be made to locate/index/search
>	the files?
>
>I've looked at the lucene docs and at the cocoon example, and
>still haven't gotten it all clear...glad I'm not the only one!
>  
>



---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail:   <cocoon-users-help@xml.apache.org>


Mime
View raw message