cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Franz <>
Subject Extending DirectoryGenerator
Date Sun, 05 Jun 2005 22:07:17 GMT
I am thinking about a simple CMS (Content Management System) which would 
have the following features:
1. Ability to list MS-Office files along with their <SummaryInformation> 
attributes (this would use Jakarta POI), ability to list "image" files 
(basically by cloning the functionality in ImageDirectoryGenerator) and 
be able to be extended to other commonly used document formats such as PDF
2. The output of #1 would be used as input to create a Lucene Index.
3. The Lucene index would be used to search an Intranet by Author, 
Title, Subject, etc.

This would mean that content-creators in the organisation would 
categorise documents simply by updating <SummaryInformation> 
('Properties' in MS-Office applications) and then uploading the file 
(the current implementation requires them to update a database separate 
to the document itself). The Cocoon application would automatically 
categorise the document, either by using Lucene or from the 
SummaryInformation. Indexing would only apply to the header/meta info - 
full text indexing of content is not required.

The question (to experienced Cocoon developers) is what is the preferred 
method of implementation?

Option 1. Extend DirectoryGenerator similar to the way 
ImageDirectoryGenerator is implemented but adding new file types

Option 2. Use DirectoryGenerator 'as is' but augment it with a 
HeaderGenerator per file/mimetype and then aggregate results such that 
the output is similar to #1

Option 3. Tell the users to 'SaveAs' MS-Office documents into an XML 
format and use XSLT to extract the summary information. For example 
Visio binary format (VSD) can be saved as VXD and the same information 
can be extracted via XSLT

All of the above are feasible and invariant to the user-interface so the 
question is more about performance.

Has anyone gone down this route? Are there any pitfalls I need to be 
aware of? For the experienced Cocoon developers, what is your gut-feel 
about which is the preferred option?

Replies much appreciated.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message