cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ad...@implements.be
Subject Re: Using the Cocoon pipeline outside web apps
Date Mon, 16 Dec 2002 11:00:43 GMT
It must be possible to do, since an OpenOffice file is just a set of XML files, 
zipped into one.  I did the opposite thing: create one XML file from the Zip 
file, in order to publish them through Cocoon.  I used Perl.  Here's my script.
This doesn't do what you want it to do, but hey, it's Open Source, right.


#!/usr/bin/perl
# Written by Yves Vindevogel - yves.vindevogel@implements.be
# 14-Nov-2002

# This file opens a OpenOffice document (which is a zip file)
# and exports all the files in the document to XML
#
# Usage: oo2xml inputfile outputfile


# Check if the input file exists
unless (-e @ARGV[0])
{	die "oo2xml error: Could not find input file\n" ;
} ;

# Run system command to unzip the file into a temp xml file
# unzip -p  opens the zip file and puts the content in the pipe
# Since the content of an OpenOffice file is plain XML,
# all the files in the OO file are put into the pipe.
# The pipe is then flushed into a file, thus the xml file
# contains all the content, in XML.
# This is not a new valid XML file !!
# On the temp xml file, some modifications must be done.
system "unzip -p @ARGV[0] > /tmp/tmp.xml"
	|| die "oo2xml error: Could not unzip the input file\n";

# Open the temp xml file
open (tmp, "/tmp/tmp.xml")
	|| die "oo2xml error: Could not open temp file\n" ;

# Open second temp file to split the tags
# When the tags are not split, and an <!tag> comes second, 
# the complete line is neglected, resulting in bugs
# Therefore, in a first pass, the tags rewritten to a seperate line
open (tmp2, "> /tmp/tmp2.xml")
	|| die "002xml error: Could not open temp split file\n" ;

# Loop through lines and split by entering a \n between the > and <
while ($line = <tmp>)
{
	$line =~ s/></>\n</g ;

 	print tmp2 $line ;
} ;

# Close them
close tmp2 ;
close tmp ;

# Open the filtered input file
open (tmp, "/tmp/tmp2.xml")
	|| die "oo2xml error: Could not open split file\n" ;

# Open the output file
open (xml, "> @ARGV[1]")
	|| die "oo2xml error: Could not open output file\n" ;

# Print the office:document tag
# The complete document needs to be enclosed by one root element
# The root element will thus be <office:document>
print xml "<?xml version=\x221.0\x22 encoding=\x22UTF-8\x22?>\n" ; # \x22 = "
print xml "<office:document " ;
print xml "xmlns:office=\x22http://openoffice.org/2000/office\x22 " ;
print xml "xmlns:style=\x22http://openoffice.org/2000/style\x22 " ;
print xml "xmlns:text=\x22http://openoffice.org/2000/text\x22 " ;
print xml "xmlns:table=\x22http://openoffice.org/2000/table\x22 " ;
print xml "xmlns:draw=\x22http://openoffice.org/2000/drawing\x22 " ;
print xml "xmlns:fo=\x22http://www.w3.org/1999/XSL/Format\x22 " ;
print xml "xmlns:xlink=\x22http://www.w3.org/1999/xlink\x22 " ;
print xml "xmlns:number=\x22http://openoffice.org/2000/datastyle\x22 " ;
print xml "xmlns:svg=\x22http://www.w3.org/2000/svg\x22 " ;
print xml "xmlns:chart=\x22http://openoffice.org/2000/chart\x22 " ;
print xml "xmlns:dr3d=\x22http://openoffice.org/2000/dr3d\x22 " ;
print xml "xmlns:math=\x22http://www.w3.org/1998/Math/MathML\x22 " ;
print xml "xmlns:form=\x22http://openoffice.org/2000/form\x22 " ;
print xml "xmlns:script=\x22http://openoffice.org/2000/script\x22 " ;
print xml "xmlns:config=\x22http://openoffice.org/2001/config\x22 " ;
print xml "xmlns:meta=\x22http://openoffice.org/2000/meta\x22 " ;
print xml "xmlns:manifest=\x22http://openoffice.org/2001/manifest\x22 " ;
print xml "xmlns:dc=\x22http://purl.org/dc/elements/1.1/\x22 " ;

print xml ">\n" ;

# Loop through the lines in the temp XML file
# Lines with DOCTYPE descriptions and version info is omitted
while ($line = <tmp>)
{
	# temp var to see if we need to write the line
 	$ok = 1 ;

	# Two reasons not to write the line: procession instructions and 
doctypes
 	if ($line =~ /<\x3F/) { $ok = 0; } ; # \x3F = ?
 	if ($line =~ /<!/) { $ok = 0; } ;

	# Remove any xmlns info from the line,
	# all the namespace information is already written in the root element
	# If you don't remove them, you get errors
 	if ($line =~ /xmlns/)
 	{
		# Split on white space
 		@tags = split / /, $line ;

		# Loop through the tags, 
		# if xmlns, check to see if it was the first or last tag
		# If so, write the opening or closing tag
		# otherwise simply write the tag and a white space
 		foreach $tag (@tags)
 		{
 			if ($tag =~ /xmlns/)
 			{
 				if ($tag =~ /</) { print xml "<"} ;
 				if ($tag =~ />/) { print xml ">\n"} ;
 			}
 			else
 			{
 				print xml $tag, " ";
 			}  ;
 		} ;

		# Don't need to write the line, already written
 		$ok = 0 ;
 	} ;

	# Write the line if the temp var is still 1
 	unless ($ok == 0) { print xml $line ; } ;
} ;

# Write document end tag
print xml "</office:document>\n" ;

# Delete the temp files
system "rm -f /tmp/tmp.xml"
	|| warn "oo2xml warning: Temp file could not be deleted" ;

system "rm -f /tmp/tmp2.xml"
 	|| warn "oo2xml warning: Temp split file could not be deleted" ;






Citeren Olivier Mengué <dolmen@users.sourceforge.net>:

> Hi,
> 
> I'm working on a project that will generate OpenOffice.org document from
> data extracted from a database. Our aim is to automatise the publishing of
> the program of hikes for my hikers association. It is actually done with a
> Microsoft Word document merge and it is not satisfying. PDF is not an option
> as publishers have to do additionnal editing after the automatic step.
> The output document will be many pages long, so we want to process in batch
> instead of as a web application.
> 
> As OpenOffice.org document format is XML, I would like to reuse the Cocoon
> pipeline with an ESQL transformer from a simple Java application.
> 
> My question are :
> - is it possible ? I mean, is it possible to reuse just the pipeline in a
> standard Java application, without the sitemap and servlet stuff, without
> too much code or too many dependencies. The pipeline would be either
> hard-coded or specified with a simpler sitemap-like configuration file.
> - how ? The package org.apache.cocoon.components.pipeline seems interesting,
> but I don't know which class to use and how to build a simple pipeline with
> a generator, a transformer and serialiser. Then, how to feed the pipeline ?
> 
> Could you point me to the important classes, and the order to create them ?
> 
> 
> Thank you for your help,
> 
> Olivier Mengué
> 
> 
> ---------------------------------------------------------------------
> Please check that your question  has not already been answered in the
> FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>
> 
> To unsubscribe, e-mail:     <cocoon-users-unsubscribe@xml.apache.org>
> For additional commands, e-mail:   <cocoon-users-help@xml.apache.org>

---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <cocoon-users-unsubscribe@xml.apache.org>
For additional commands, e-mail:   <cocoon-users-help@xml.apache.org>


Mime
View raw message