forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <rgard...@apache.org>
Subject Re: Forrest as an XML repository
Date Fri, 03 Jun 2005 10:40:38 GMT
Juan Jose Pablos wrote:
> FYI:
> 
> Ricardo Beltran wrote:

I've CC'd Ricardo on this reply - please reply all.

...

>> My questions are: Do you think that Forrest is an appropriate framework
>> for this purpose? and Do you think that Lucene or
>> Google will do the job of indexing about (5 GB) of XML
>> files?

I can't comment with authority on the suitability of Google or Lucene 
for this as I have no experience. My gut is telling me that this is not 
the optimal solution.

I do have a project that has around 8Gb of dynamic data being published 
via the Forrest webapp.

The solution I employed, and one that appears to be working well, was to 
have the data in an XML enabled database, in this case we used Oracle, 
but we have successfully used XIndice and eXist in similar, smaller, 
projects in the past. I wrote a custom generator to retrieve the data 
from the DBMS.

It should be noted that Cocoon has some database components that can be 
utilised (there is the results of some early experiments of I did with 
these components in the whiteboard plugin 
org.apache.forrest.plugin.Database). The reason I never completed work 
on this plugin was not a problem with it, but additional requirements 
that made it easier to build a custom generator (our requests were also 
dependant on live data from sensor readings over an RS232 port).

The system has now been running for about 3 months and we are very happy 
with it. Because we are using a Database server as the repository we 
have all the indexing and optimisation provided by that server. We also 
have the benefit of a very expressive and mature search language.

Of course, this solution requires that you run the system dynamically. 
Using Google to index your site would allow you to run statically. 
Trying to build a static site from 5GB of data would be a wonderful 
stress test, if you do this please report your findings to us.

Ross

Mime
View raw message