forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramon Prades" <rpra...@porcelanosa.com>
Subject RE: Lucene Search
Date Thu, 14 Aug 2003 15:49:35 GMT
Hi Jeff

Thanks for spotting that. Here is a corrected version.

The approach you suggest is quite interesting, so I'll have a closer look at
it.

Regards.

Ramon
-----Mensaje original-----
De: Jeff Turner [mailto:jefft@apache.org] 
Enviado el: jueves, 14 de agosto de 2003 14:35
Para: forrest-dev@xml.apache.org
Asunto: Re: Lucene Search


Great stuff! :)  Very nicely packaged too.  I ran the script and it all
worked perfectly.  Only glitch is that when a search is run from a
subdirectory, the path to search.cmd is wrong.

Only concern I have is rather long-term; that the indexer is using the raw
XML files directly, and thereby assumes a 1-1 mapping from the filesystem to
the URI space.  With Cocoon, the two are completely separated and need not
correspond.

For instance, we have a status.xml file containing content, which is split
up and served as changes.html and todo.html.  The lucene indexer's guess of
status.html would be wrong.

Another example: the Forrest site pulls in content from an external RSS feed
(the 'forrest-issues.html' page, currently commented out).  This RSS is
seamlessly merged with local content, and users would expect it to be
indexed like any other content.

Yet another example; Cocoon allows XML content to be pulled from all sorts
of weird sources (CVS, XML databases) simply by changing a URL. These
couldn't be indexed by a file-centric indexer.


I think the 'lucene' block in Cocoon takes the right approach to this
problem; it asks Cocoon for the content 'view' of a page, then asks for the
links 'view', and crawls each of the returned links, thereby recursively
covering the whole site.

You can see this View support for yourself, if you type 'forrest run' in a
project, and request:

http://localhost:8888/index.html?cocoon-view=links  (links for a page)

Content views aren't defined by default, but its very easy to do for XML
content.


But what you've done is sufficient for probably 80% of Forrest sites, and
I'll be using it myself, so thanks :)


--Jeff


> Thanks.
>  
> Ramon
>  




Mime
View raw message