forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Turner <je...@apache.org>
Subject Re: Lucene Search
Date Thu, 14 Aug 2003 12:35:10 GMT
On Wed, Aug 13, 2003 at 04:21:35PM +0200, Ramon Prades wrote:
> Hi all
>  
> I have added Lucene to Forrest, so it is possible to do searches in forrest
> sites. The attached zip has got all the source files. There is a "build.xml"
> so ant will upgrade forrest automatically. It needs the latest source coude
> from the CVS.
>  
> Please check "readme.txt" for more details.
>  
> I'm going to be working with this for a while (this is only a very first
> version but needs lots of work), so any feedback would be appreciated.

Great stuff! :)  Very nicely packaged too.  I ran the script and it all
worked perfectly.  Only glitch is that when a search is run from a
subdirectory, the path to search.cmd is wrong.

Only concern I have is rather long-term; that the indexer is using the
raw XML files directly, and thereby assumes a 1-1 mapping from the
filesystem to the URI space.  With Cocoon, the two are completely
separated and need not correspond.

For instance, we have a status.xml file containing content, which is
split up and served as changes.html and todo.html.  The lucene indexer's
guess of status.html would be wrong.

Another example: the Forrest site pulls in content from an external RSS
feed (the 'forrest-issues.html' page, currently commented out).  This RSS
is seamlessly merged with local content, and users would expect it to be
indexed like any other content.

Yet another example; Cocoon allows XML content to be pulled from all
sorts of weird sources (CVS, XML databases) simply by changing a URL.
These couldn't be indexed by a file-centric indexer.


I think the 'lucene' block in Cocoon takes the right approach to this
problem; it asks Cocoon for the content 'view' of a page, then asks for
the links 'view', and crawls each of the returned links, thereby
recursively covering the whole site.

You can see this View support for yourself, if you type 'forrest run' in
a project, and request:

http://localhost:8888/index.html?cocoon-view=links  (links for a page)

Content views aren't defined by default, but its very easy to do for XML
content.


But what you've done is sufficient for probably 80% of Forrest sites, and
I'll be using it myself, so thanks :)


--Jeff


> Thanks.
>  
> Ramon
>  

Mime
View raw message