cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From giacomo <giac...@apache.org>
Subject Re: [Status] Searching XML in Cocoon
Date Sat, 15 Dec 2001 20:09:42 GMT
On Sat, 15 Dec 2001, Bernhard Huber wrote:

>   Hi,
> I'd like to commit Searching XML in Cocoon.
> I must confess that I have not taken the CVS SSH hurdle, yet.

You have the links for it?

> Moreover I like to know into which branch I should check-in and if its
> into src, or scratchpad.

I'd recommend scratchpad for now.

> As this is not final, I think inserting into scratchpad would be better,
> moreover people may use and try it first.

Yup.

> I think using a sitemap would be okay for using the searching, and
> indexing, and demonstrating the usage of these components.

I'm not sure what you mean. A sub-sitenap for the samples?

> Uhps, and I think I have vioaleted the codeing convents indenting only 2
> spaces, need to reformat before submitting,
> is there any tool for that?

I do that with (X)Emacs.

> Any comments?
>
> Some docu about the feature...

Would be cool if you can rewrite these docs using DocBook or
Document-v10 DTDs.

Giacomo

>
> Abstract
> Searching XML in Cocoon using Lucene as search engine.
>
> Overview
> Lucene ( http://jakarta.apache.org/lucene ) is a indexing & searching API.
> Several new Cocoon components utilizes this API to provide "Searching
> XML in Cocoon".
>
> There are two services provided by these components:
> Indexing
> Searching
>
> Indexing is realized by crawling starting from a base URI, and
> generating a lucene index.
> Searching uses the generated lucene index. The index is searched for a
> requested query.
>
> The crawling component is packed in org.apache.cocoon.components.crawler.
> Indexing and searching is packed in org.apache.cocoon.components.search.
> A Cocoon generator using the searching components is packaged in
> org.apache.cocoon.generation.
>
> A GUI for searching is implemented by using XSP, and as a generator.
> Both implementions can be used independtly.
>
> Description
>
> As having an existing index is a precondition for searching, the
> description of crawling and indexing is described first; a description
> of the searching follows.
>
> The crawling component provides all links of requested URI. The links of
> a URI are requested by using the Cocoon feature of views. A URI which is
> allowed to get crawled, must provide a view. By default the crawling
> component requests the view links.
> A  link view must provide a response of content type
> application/x-cocoon-links.  Using a serializer type links  having src
> org.apache.cocoon.serialization.LinkSerializer will guarentee the
> correct content type.
>
> The indexing component crawls in-depth, starting from a given base URI.
> The indexing component uses a crawler component to receive all links of
> a page. The indexing component filters the response of a crawler.
> Filtering asserts following conditions:
> Index only resources which have not been indexed already.
> Index only resources which are indexable, like documents, ignore images,
> non-xml documents.
>
> Indexing parses an XML document, and produces a lucene document. A
> lucene document may have serval fields, which acts like columns of a
> database table.
>
> Indexing writes the lucene index into a directory, by default the Cocoon
> working directory is used. Moreover a lucene analyzer, and the lucene
> writing mode must be defined.
>
> The searching components uses a created lucene index. The index may be
> created by any lucene indexer.
> The searching component must have access to an index directory, and it
> should use the same lucene analyzer as the indexer at creation time of
> the index directory.
> The searching component returns all hits of a search, the XSP, and the
> generator filters the hits for a all hits displayed on a page.
>
> The search generator searches the lucene index by using the searching
> components, and
> generates XML content.
> As sample of the XML content produced by the search generator:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <search:results date="1008437081064" query-string="cocoon"
> start-index="0" page-length="10"
>   xmlns:search="http://apache.org/cocoon/search/1.0"
>   xmlns:xlink="http://www.w3.org/1999/xlink">
>   <search:hits total-count="125" count-of-pages="13">
>     <search:hit rank="0" score="1.0"
> uri="http://localhost:8080/cocoon/documents/hosting.html"/>
>     <search:hit rank="1" score="1.0"
> uri="http://localhost:8080/cocoon/documents/hosting.html"/>
>     <search:hit rank="2" score="1.0"
> uri="http://localhost:8080/cocoon/documents/hosting.html"/>
>     <search:hit rank="3" score="0.93121004"
> uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/>
>     <search:hit rank="4" score="0.93121004"
> uri="http://localhost:8080/cocoon/documents/userdocs/actions/actions.html"/>
>     <search:hit rank="5" score="0.7112235"
> uri="http://localhost:8080/cocoon/documents/mail-archives.html"/>
>     <search:hit rank="6" score="0.70967746"
> uri="http://localhost:8080/cocoon/documents/userdocs/serializers/link-serializer.html"/>
>     <search:hit rank="7" score="0.6881721"
> uri="http://localhost:8080/cocoon/documents/userdocs/serializers/text-serializer.html"/>
>     <search:hit rank="8" score="0.6881721"
> uri="http://localhost:8080/cocoon/documents/userdocs/serializers/vrml-serializer.html"/>
>     <search:hit rank="9" score="0.6666666"
> uri="http://localhost:8080/cocoon/documents/userdocs/serializers/svgpng-serializer.html"/>
>   </search:hits>
>   <search:navigation total-count="125" count-of-pages="13"
>     has-next="true" has-previous="false" next-index="10" previous-index="0">
>     <search:navigation-page start-index="0"/>
>     <search:navigation-page start-index="10"/>
>     <search:navigation-page start-index="20"/>
>     <search:navigation-page start-index="30"/>
>     <search:navigation-page start-index="40"/>
>     <search:navigation-page start-index="50"/>
>     <search:navigation-page start-index="60"/>
>     <search:navigation-page start-index="70"/>
>     <search:navigation-page start-index="80"/>
>     <search:navigation-page start-index="90"/>
>     <search:navigation-page start-index="100"/>
>     <search:navigation-page start-index="110"/>
>     <search:navigation-page start-index="120"/>
>   </search:navigation>
> </search:results>
>
> The navigation elements is for easy handling of navigation issues, in a
> xslt.
>
> Bill Of Material:
>
> New packages:
> org.apache.cocoon.components.crawler,
> org.apache.cocoon.components.search
>
> New avalon components:
> org.apache.cocoon.components.crawler.CocoonCrawler
> org.apache.cocoon.components.crawler.SimpleCocoonCrawlerImpl:
>   external http crawler for Cocoon. This crawler generates a list of links
>   received from a URI request, enhancing it with a cocoon-view query.
>
> org.apache.cocoon.components.IndexHelperField
> org.apache.cocoon.components.LuceneCocoonHelper
> org.apache.cocoon.components.LuceneCocoonIndexer
> org.apache.cocoon.components.LuceneCocoonPager
> org.apache.cocoon.components.LuceneCocoonSearcher
> org.apache.cocoon.components.LuceneIndexContentHandler
> org.apache.cocoon.components.LuceneXMLIndexer
> org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl
> org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl
> org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl
>
> New sitemap components:
> org.apache.cocoon.generation.SearchGenerator
>
> New JUnit testcase:
> org.apache.cocoon.generation.test.SearchGeneratorTestCase
>
> New webapp resources:
> sitemap.xmap
> search-index.xsp
> welcome-index.xsp
> create-index.xsp
> stylesheets/search2html.xsl
> lucene_green_300.gif
>
> Compiling & Installing:
>
> For compiling, and at runtime, a lucene.jar is neccessary. This will
> need a changing the build.xml is neccessary, too, for checking availability,
> and modifying the webapp sitemap for includeing the search demo.
>
> Installing the the avalon components needs change of the cocoon.xconf
> file inserting the avalon components
> org.apache.cocoon.components.LuceneXMLIndexer
> org.apache.cocoon.components.SimpleLuceneCocoonIndexerImpl
> org.apache.cocoon.components.SimpleLuceneCocoonSearcherImpl
> org.apache.cocoon.components.SimpleLuceneXMLIndexerImpl.
>
> A sitemap, or subsitemap to be adapted for using the XSP, and the generator.
>
>
> bye bernhad
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message