cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Parekh" <>
Subject Cocoon : Content Transformation from WORD/PDF/EXCEL to XML
Date Tue, 03 Sep 2002 13:50:42 GMT
I have a requirement to convert hundreds of unstructured documents in
into a structured repository of XML Metadata of the document and the
documents itself. 
I need to parse each of these documents and extract the relevant
information to build a XML metadata
document for each document. 
The XML structured metadata of the underlying document will contain
fields like Keywords, Category, Doc Name, 
Author etc. 
Is it possible to use Cocoon and or POI to do this.  And if yes how to
use Cocoon to do the extraction. 
I am new to Cocoon, and trying to understand the world of
transformers/generators etc. 
Also could I use Lucene to index the XML documents and build a search
engine around it. 
I would like to know about the possible ways to do this. 

View raw message