Return-Path: Mailing-List: contact cocoon-dev-help@xml.apache.org; run by ezmlm Delivered-To: mailing list cocoon-dev@xml.apache.org Received: (qmail 44675 invoked from network); 15 Feb 2000 20:41:48 -0000 Received: from unknown (HELO arkin.exoffice.com) (207.33.160.68) by locus.apache.org with SMTP; 15 Feb 2000 20:41:48 -0000 Received: from exoffice.com (IDENT:arkin@arkin.exoffice.com [192.168.1.4]) by arkin.exoffice.com (8.9.3/8.9.3) with ESMTP id MAA02096; Tue, 15 Feb 2000 12:40:46 -0800 Sender: arkin@arkin.exoffice.com Message-ID: <38A9B9CE.5F419F2A@exoffice.com> Date: Tue, 15 Feb 2000 12:40:46 -0800 From: Assaf Arkin Organization: Exoffice X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Kay Michael CC: "'Scott Boag/CAM/Lotus'" , James Clark , Steve Muench , Adam Winer , Eduardo.Pelegrillopart@eng.sun.com, sax@megginson.com, cocoon-dev@xml.apache.org, xalan-dev@xml.apache.org Subject: Proposal for Serializer API References: <93CB64052F94D211BC5D0010A800133101FDEA33@wwmess3.bra01.icl.co.uk> Content-Type: multipart/mixed; boundary="------------2B1A06767E2042F7759536A8" This is a multi-part message in MIME format. --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit The purpose of the serializer API is to enable SAX events and DOM documents to be written into a document. The serializer API allows any implementation supporting any number of output formats (XML, HTML, PDF, PNG, etc). Originally this API was named serializer since it serializes a DOM document into the output document, and is consistent with the term 'serializer' that XML and DOM specifications will refer to. The serializer API assumes that a serializer implementation will exist, whether or not an XSLT processor implementation exists, hence my proposal to place it in a separate package. It does not assume, however, that any given serializer implementation will be available. Four output methods are defined by default (XML, HTML, XHTML, TEXT), so implementations are encouraged to support all four. The serializer API defines the following interfaces: * SerializerFactory -- a way to acquire a new serializer for a given output method * Serializer -- a way to set the output format and output stream for a serializer, and a way obtain a DocumentHandler (SAX1), ContentHandler (SAX2) or DOMSerializer (DOM L1) for serializing * OutputFormat -- basic set of output format properties based on xsl:output. Implementations may use an extended class with additional properties (e.g. indentation level, line separator, etc) * Method -- a list of the four common output method names. Additional methods should follow the uri:name pattern. * QName -- QNames are used for CDATA and non-escaping elements An implementation could roll all three serializers (SAX1, SAX2, DOM) into one, or provide different implementations through the Serializer interface. By definition a Serializer is reusable. By changing the output stream/output format, the same serializer can be used over and over. However, a serializer is not multi-threaded -- it can only serialize one document at any given time. Currently there is no way to control namespace behavior. The serializer will use the prefix/uris providers in the DOM/SAX events. An XSLT processor may choose to use a serializer implementation, provide a serializer implementation, or use it's internal mechanisms and ignore this package all together. However, by supporting OutputFormat, the XSLT processor allows the following to happen: * A stylesheet is read using the XSLT processor and the OutputFormat is returned to the application * A document is transformed using the XSLT processor, and the resulting document (DOM or SAX) is returned to the application * The application performs further processing on the document, caching, filtering, etc * The application sends the document (DOM or SAX) to a serializer using the OutputFormat read from the stylesheet Comments? arkin -- ---------------------------------------------------------------------- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc. www.exolab.org --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="DOMSerializer.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="DOMSerializer.java" package org.xml.serialize; import java.io.IOException; import org.w3c.dom.Element; import org.w3c.dom.Document; import org.w3c.dom.DocumentFragment; /** * Interface for a DOM serializer implementation. * * * @version * @author Scott Boag * @author Assaf Arkin */ public interface DOMSerializer { /** * Serializes the DOM element. Throws an exception only if an I/O * exception occured while serializing. * * @param elem The element to serialize * @throws IOException An I/O exception occured while serializing */ public void serialize( Element elem ) throws IOException; /** * Serializes the DOM document. Throws an exception only if an I/O * exception occured while serializing. * * @param doc The document to serialize * @throws IOException An I/O exception occured while serializing */ public void serialize( Document doc ) throws IOException; /** * Serializes the DOM document fragment. Throws an exception only * if an I/O exception occured while serializing. * * @param frag The document fragment to serialize * @throws IOException An I/O exception occured while serializing */ public void serialize( DocumentFragment frag ) throws IOException; } --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="Method.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Method.java" package org.xml.serialize; /** * Names of the four default output methods. *

* Four default output methods are defined: XML, HTML, XHTML and TEXT. * Serializers may support additional output methods. The names of * these output methods should be encoded as namespace:local. * * @version * @author Assaf Arkin * @see OutputFormat */ public final class Method { /** * The output method for XML documents: xml. */ public static final String XML = "xml"; /** * The output method for HTML documents: html. */ public static final String HTML = "html"; /** * The output method for XHTML documents: xhtml. */ public static final String XHTML = "xhtml"; /** * The output method for text documents: text. */ public static final String TEXT = "text"; } --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="OutputFormat.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="OutputFormat.java" package org.xml.serialize; /** * The output format affects the manner in which a document is * serialized. The output format determines the output method, * encoding, indentation, document type, and various other properties * that affect the manner in which a document is serialized. *

* Once an output format has been handed to a serializer or XSLT * processor, the application should not attempt to reuse it. The * serializer or XSLT processor may modify the properties of the * output format object. *

* Implementations may provide classes that extend OutputFormat * with additional properties, e.g. indentation level, line separation, * namespace handlers, etc. An application may use these extra properties * by constructing an output format object based on the implementation * specified type. *

* OutputFormat has been modeled after the XSLT <xsl:output> * element declaration. However, it does not assume the existence of an * XSLT processor or a particular serializer. *

* Typical usage scenarios supported by OutputFormat: *

    *
  • The application constructs an OutputFormat object and * passes it to the serializer *
  • The application constructs an OutputFormat object and * passes it to the XSLT processor, overriding the properties * specified in the stylesheet *
  • The XSLT processor constructs an OutputFormat object * and passes it to the serializer *
  • The XSLT processor constructs an OutputFormat object * from the stylesheet and returns it to the applicatio, the * application passes OutputFormat to the serializer *
* * @version * @author Assaf Arkin * Keith Visco * @see Method */ public class OutputFormat { /** * Holds the output method specified for this document, * or null if no method was specified. * * @see Method */ private String _method = Method.XML; /** * Specifies the version of the output method, null for the * default. */ private String _version = null; /** * True if indentation is requested, false for no indentation. */ private boolean _indent = false; /** * The encoding to use, if an input stream is used, null for * the default. */ private String _encoding = null; /** * The specified media type or null. */ private String _mediaType = null; /** * The specified document type system identifier, or null. */ private String _doctypeSystemId = null; /** * The specified document type public identifier, or null. */ private String _doctypePublicId = null; /** * Ture if the XML declaration should be ommited; */ private boolean _omitXmlDeclaration = false; /** * List of element tag names whose text node children must * be output as CDATA. */ private QName[] _cdataElements = new QName[ 0 ]; /** * List of element tag names whose text node children must * be output unescaped. */ private QName[] _nonEscapingElements = new QName[ 0 ]; /** * True if spaces should be preserved in elements that do not * specify otherwise, or specify the default behavior. */ private boolean _preserve = false; /** * Constructs a new output format with the default values. */ public OutputFormat() { } /** * Constructs a new output format with the default values for * the specified method and encoding. * * @param method The specified output method * @param encoding The specified encoding * @param indenting True for identantion */ public OutputFormat( String method, String encoding, boolean indenting ) { setMethod( method ); setEncoding( encoding ); setIndenting( indenting ); } /** * Returns the method specified for this output format. See {@link * Method} for a list of the default methods. Other methods should * be of the format namespace:local. The default is * {@link Method#XML}. * * @return The specified output method */ public String getMethod() { return _method; } /** * Sets the method for this output format. See {@link Method} for * a list of the default methods. Other methods should be of the * format namespace:local. * * @param method The output method, or null */ public void setMethod( String method ) { _method = method; } /** * Returns the version for this output method. If no version was * specified, will return null and the default version number will * be used. If the serializer does not support that particular * version, it should default to a supported version. * * @return The specified method version, or null */ public String getVersion() { return _version; } /** * Sets the version for this output method. * * @param version The output method version, or null */ public void setVersion( String version ) { _version = version; } /** * Returns true if indentation was specified. If no indentation * was specified, returns false. A derived class may support * additional properties, e.g. indentation level, line width to * wrap at, tab/spaces, etc. * * @return True if indentation was specified */ public boolean getIndent() { return _indent; } /** * Sets the indentation on and off. A derived class may support * additional properties, e.g. indentation level, line width to * wrap at, tab/spaces, etc. * * @param ident True specifies identiation */ public void setIndenting( boolean indent ) { _indent = indent; } /** * Returns the specified encoding. If no encoding was specified, * the default is used. For XML and HTML the default would be * "UTF-8". For other output methods, the default encoding is * unspecified. * * @return The encoding */ public String getEncoding() { return _encoding; } /** * Sets the encoding for this output method. Null means the * default encoding for the selected output method. For XML and * HTML the default would be "UTF-8". For other output methods, * the default encoding is unspecified. * * @param encoding The encoding, or null */ public void setEncoding( String encoding ) { _encoding = encoding; } /** * Returns the specified media type. For each output method a * default media type will be used if one was not specified. * * @return The specified media type, or null */ public String getMediaType() { return _mediaType; } /** * Sets the media type. For each output method a default media * type will be used if one was not specified. * * @param mediaType The specified media type */ public void setMediaType( String mediaType ) { _mediaType = mediaType; } /** * Sets the document type public and system identifiers. If not * specified the document type will depend on the output method * (e.g. HTML, XHTML) or from some other mechanism (e.g. SAX * events, DOM DocumentType). * * @param publicId The public identifier, or null * @param systemId The system identifier, or null */ public void setDoctype( String publicId, String systemId ) { _doctypePublicId = publicId; _doctypeSystemId = systemId; } /** * Returns the specified document type public identifier, * or null. */ public String getDoctypePublicId() { return _doctypePublicId; } /** * Returns the specified document type system identifier, * or null. */ public String getDoctypeSystemId() { return _doctypeSystemId; } /** * Returns true if the XML document declaration should * be ommited. The default is false. */ public boolean getOmitXMLDeclaration() { return _omitXmlDeclaration; } /** * Sets XML declaration omitting on and off. * * @param omit True if XML declaration should be ommited */ public void setOmitXMLDeclaration( boolean omit ) { _omitXmlDeclaration = omit; } /** * Returns a list of all the elements whose text node children * should be output as CDATA. Returns an empty array if no such * elements were specified. * * @return List of all CDATA elements */ public QName[] getCDataElements() { return _cdataElements; } /** * Sets the list of elements for which text node children * should be output as CDATA. * * @param cdataElements List of all CDATA elements */ public void setCDataElements( QName[] cdataElements ) { if ( cdataElements == null ) _cdataElements = new QName[ 0 ]; else _cdataElements = cdataElements; } /** * Returns a list of all the elements whose text node children * should be output unescaped (no character references). Returns * an empty array if no such elements were specified. * * @return List of all non escaping elements */ public QName[] getNonEscapingElements() { return _nonEscapingElements; } /** * Sets the list of elements for which text node children * should be output unescaped (no character references). * * @param nonEscapingElements List of all non-escaping elements */ public void setNonEscapingElements( QName[] nonEscapingElements ) { if ( nonEscapingElements == null ) _nonEscapingElements = new QName[ 0 ]; else _nonEscapingElements = nonEscapingElements; } /** * Returns true if the default behavior for this format is to * preserve spaces. All elements that do not specify otherwise * or specify the default behavior will be formatted based on * this rule. All elements that specify space preserving will * always preserve space. */ public boolean getPreserveSpace() { return _preserve; } /** * Sets space preserving as the default behavior. The default is * space stripping and all elements that do not specify otherwise * or use the default value will not preserve spaces. * * @param preserve True if spaces should be preserved */ public void setPreserveSpace( boolean preserve ) { _preserve = preserve; } } --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="QName.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="QName.java" package org.xml.serialize; /** * A qualified name. A qualified name has a local name, a namespace * URI and a prefix (if known). A QName may also specify * a non-qualified name by having a null namespace URI. * * @version * @author Assaf Arkin */ public class QName { private String _localName; private String _namespaceURI; private String _prefix; /** * Constructs a new QName with the specified namespace URI and * local name. * * @param namespaceURI The namespace URI if known, or null * @param localName The local name */ public QName( String namespaceURI, String localName ) { if ( localName == null ) throw new IllegalArgumentException( "Argument 'localName' is null" ); _namespaceURI = namespaceURI; _localName = localName; } /** * Constructs a new QName with the specified namespace URI, prefix * and local name. * * @param namespaceURI The namespace URI if known, or null * @param prefix The namespace prefix is known, or null * @param localName The local name */ public QName( String namespaceURI, String prefix, String localName ) { if ( localName == null ) throw new IllegalArgumentException( "Argument 'localName' is null" ); _namespaceURI = namespaceURI; _prefix = prefix; _localName = localName; } /** * Returns the namespace URI. Returns null if the namespace URI * is not known. * * @return The namespace URI, or null */ public String getNamespaceURI() { return _namespaceURI; } /** * Returns the namespace prefix. Returns null if the namespace * prefix is not known. * * @return The namespace prefix, or null */ public String getPrefix() { return _prefix; } /** * Returns the local part of the qualified name. * * @return The local part of the qualified name */ public String getLocalName() { return _localName; } } --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="Serializer.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="Serializer.java" package org.xml.serialize; import java.io.Writer; import java.io.OutputStream; import java.io.UnsupportedEncodingException; import org.xml.sax.DocumentHandler; import org.xml.sax.ContentHandler; /** * Interface to a serializer implementation. A serializer is created * from the {@link SerializerFactory}. Prior to using the serializer, * the output format and output stream or writer should be set. The * serializer is then used in one of three ways: *
    *
  • To serialize SAX 1 events call {@link #asDocumentHandler} *
  • To serialize SAX 2 events call {@link #asContentHandler} *
  • To serialize a DOM document call {@link #asDOMSerializer} * (see {@link DOMSerializer}) *
* * @version * @author Assaf Arkin * @author Scott Boag */ public interface Serializer { /** * Specifies an output stream to which the document should be * serialized. This method should not be called while the * serializer is in the process of serializing a document. *

* The encoding specified in the {@link OutputFormat} is used, or * if no encoding was specified, the default for the selected * output method. * * @param output The output byte stream * @param UnsupportedEncodingException The encoding specified in * the output format is not supported */ public void setOutputByteStream( OutputStream output ) throws UnsupportedEncodingException; /** * Specifies a writer to which the document should be serialized. * This method should not be called while the serializer is in * the process of serializing a document. *

* The encoding specified for the {@link OutputFormat} must be * identical to the output format used with the writer. * * @param output The output character stream */ public void setOutputCharStream( Writer output ); /** * Specifies an output format for this serializer. It the * serializer has already been associated with an output format, * it will switch to the new format. This method should not be * called while the serializer is in the process of serializing * a document. * * @param format The output format to use */ public void setOutputFormat( OutputFormat format ); /** * Return a {@link DocumentHandler} interface into this serializer. * If the serializer does not support the {@link DocumentHandler} * interface, it should return null. */ public DocumentHandler asDocumentHandler(); /** * Return a {@link ContentHandler} interface into this serializer. * If the serializer does not support the {@link ContentHandler} * interface, it should return null. */ public ContentHandler asContentHandler(); /** * Return a {@link DOMSerializer} interface into this serializer. * If the serializer does not support the {@link DOMSerializer} * interface, it should return null. */ public DOMSerializer asDOMSerializer(); } --------------2B1A06767E2042F7759536A8 Content-Type: text/plain; charset=us-ascii; name="SerializerFactory.java" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="SerializerFactory.java" package org.xml.serialize; /** * Factory for creating new serializers. * * @version * @author Scott Boag * @author Assaf Arkin */ public interface SerializerFactory { /** * Returns a serializer for the specified output method. Returns * null if no implementation exists that supports the specified * output method. For a list of the default output methods see * {@link Method}. * * @param method The output method * @return A suitable serializer, or null */ public Serializer getSerializer( String method ); } --------------2B1A06767E2042F7759536A8--