cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Parekh" <rajesh.par...@smartdot.net>
Subject Cocoon : Content Transformation from WORD/PDF/EXCEL to XML
Date Tue, 03 Sep 2002 13:50:42 GMT
Hi, 
 
I have a requirement to convert hundreds of unstructured documents in
WORD/PDF/TXT/EMAIL formats
into a structured repository of XML Metadata of the document and the
documents itself. 
 
I need to parse each of these documents and extract the relevant
information to build a XML metadata
document for each document. 
 
The XML structured metadata of the underlying document will contain
fields like Keywords, Category, Doc Name, 
Author etc. 
 
Is it possible to use Cocoon and or POI to do this.  And if yes how to
use Cocoon to do the extraction. 
 
I am new to Cocoon, and trying to understand the world of
transformers/generators etc. 
 
Also could I use Lucene to index the XML documents and build a search
engine around it. 
 
I would like to know about the possible ways to do this. 
 
regards
 
rajesh. 
 

Mime
View raw message