cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject Re: Searching XML content using lucene
Date Mon, 03 Dec 2001 11:24:05 GMT
Bernhard Huber wrote:
> Hi,
> There was some mails regarding sematic searching, and using lucene (
> )
> as an indexing engine some time ago.


> For all who are interested in indexing & searching xml, some noted about
> the implementation which is just at the beginnig:
> I have now implemented some avalon components for:
> 1) Crawling cocoon-view=content, cocoon-view=links
> 2) Indexing xml documents, as a sample I took the /cocoon/documents URI
> space.

Wow, sounds very cool. How do you feel about sharing/donating that code?
I'd very interesting in working on that.

> The lucene documents have following fields:
> * url the url of the document
> * body the raw text of all elements of the document
> * More over each element, and each attribute of an element generated a
> field, too.
> Thus searching for "Introduction" searches the body field by default.
> Searching for "s1@title:Introduction" searches only for documents having
> an attribute title in s1 element matching Introduction.

> I have some question, maybe someone may help:
> * how can i avoid generating a full http-request, as the crawler sits
> inside of cocoon, and indexing
> an URI space of the current cocoon engine, there should be(?) some
> method accessing the
> sitemap, and forwarding it the crawling request, which will speed up the
> indexing step.

The Cocoon CLI does crawling internally without the overhead of HTTP

Follow the flow at Cocoon.main() to know how that is done.

Hope this helps.

Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<>                             Friedrich Nietzsche

To unsubscribe, e-mail:
For additional commands, email:

View raw message