cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernhard Huber" <>
Subject Re: Searching XML content using lucene
Date Mon, 03 Dec 2001 22:55:04 GMT

>Wow, sounds very cool. How do you feel about sharing/donating that code?
>I'd very interesting in working on that.
Just don't expect too much, it is just a first shot.... i hope you 
manage to make it run at your site...
Just a lot of stuff is not configurable having had time to implement it 

* install a lucene.jar from the lucene site
* the lucene index is created in <work-dir>/index.
* create the index by requesting:  createindex.xsp
* search the index by requesting: searchindex.xsp, entering a query 
string, having skipped implementing a paging if lots of matches are
* see statistics about the created index using statisticindex.xsp, my be 
used to help searching more effectifly
* load my.roles for declaring the new avalon components regarding 

DocumentHandler parses the XML document, implements the XML to lucene 
Document generation,
and creates the fiels of the lucene document,
Lucene document does NOT store any xml content,

Perhaps you find some better design, currently I didn't implement any 
SitemapComponents, just
pure avalon componets, all named "Simple*", interfaces named 
Perhaps you find some desing fitting the components into generator, 
transformer, serializer pattern,
i thought about it but i gave up, coming up with this more general 
solution, perhaps
even the ParentCM may be used?

Some feeling about searching:

    Index Search

Search Help

free AND "text search"
Search for documents containing "free" and the phrase "text search" 
+text search
    Search for documents containing "text" and preferentially containing

    * giants -football Search for "giants" but omit documents containing
    * body:john Search for documents containing "john" in the body
      field. The field "body" is used by default. Thus query "body:john"
      is equivalent to query "john".
    * s1@title:cocoon Search for documents containing "cocoon" in the
      cocoon field s1@title, ie searching in title attribute of s1
      element of xml document.

SearchResult: Total Hits: 13

Index Statistic <http://localhost:8080/cocoon/lucene/statistic>

Score Count URL
100% 0 

34% 1 

27% 2 
27% 3 http://localhost:8080/cocoon/documents/ctwig/ctwig-basic02.html 
27% 4 http://localhost:8080/cocoon/documents/ctwig/ctwig-basic02.html 
19% 5 
16% 6 http://localhost:8080/cocoon/documents/ctwig/ctwig-why.html 
10% 7 

8% 8 http://localhost:8080/cocoon/documents/userdocs/xsp/logicsheet.html 
7% 9 http://localhost:8080/cocoon/documents/faq.html 

>The Cocoon CLI does crawling internally without the overhead of HTTP
>Follow the flow at Cocoon.main() to know how that is done.
I will check it out...

bye bernhard

View raw message