cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From giacomo <giac...@apache.org>
Subject Re: Semantic searching, lucene integration, some experiences
Date Tue, 04 Dec 2001 09:00:50 GMT
On Sun, 2 Dec 2001, Bernhard Huber wrote:

> Hi,
> There was some mails regarding sematic searching, and using lucene as an
> indexing engine some time ago.
> For all who are interested in indexing & searching xml, some noted about
> the implementation which
> is just at the beginnig:
>
> I have now implemented some avalon components for:
> Crawling cocoon-view=content, cocoon-view=links
> Now I'm generating for each document which should get generated a full
> HTTP-Request.
>
> Indexing xml documents, as a sample I took the /cocoon/documents URI space.
> The lucene documents have following fields:
> * url the url of the document
> * body the raw text of all elements of the document
> * More over each element, and each attribute of an element generated a
> field, too.
> Thus searching for "Introduction" searches the body field by default.
> Searching for "s1@title:Introduction" searches only for documents having
> an attribute title in s1 element matching Introduction.
>
> I have some question, maybe you can help:
> * how can i avoid generating a full http-request, as the crawler sits
> inside of cocoon, and indexing
>  an URI space of the current cocoon engine, there should be(?) some
> method accessing the
> sitemap, and forwarding it the crawling request, which will speed up the
> indexing step.

Look how the CLI environment does it (start at org.apache.cocoon.Main)

Giacomo


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message