xml-xalan-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Boag/CAM/Lotus" <Scott_B...@lotus.com>
Subject That unwanted white space in HTML output
Date Thu, 03 Feb 2000 22:46:22 GMT
Does anyone have opinions about changing Xalan's behavior re this note.  I
have specifically followed Clark's whitespace rules, primarily for the
purposes of file comparisons of Xalan with XT, frankly.

This will be moot once the Xerces Serializers become the default for Xalan,
since I believe the Serializers already follow the convention outlined
below.

-scott


----- Forwarded by Scott Boag/CAM/Lotus on 02/03/00 05:43 PM -----
                                                                                         
                                 
                    Mike Brown                                                           
                                 
                    <mbrown@corp.webb.net        To:     "'xsl-list@mulberrytech.com'"
<xsl-list@mulberrytech.com>         
                    >                            cc:     (bcc: Scott Boag/CAM/Lotus)  
                                    
                    Sent by:                     Subject:     That unwanted white space in
HTML output                     
                    owner-xsl-list@mulber                                                
                                 
                    rytech.com                                                           
                                 
                                                                                         
                                 
                                                                                         
                                 
                    02/03/00 05:08 PM                                                    
                                 
                    Please respond to                                                    
                                 
                    xsl-list                                                             
                                 
                                                                                         
                                 
                                                                                         
                                 




Warren Hedley wrote:
> The whitespace between <a> and <img> elements is a fairly
> common problem [...] can anyone suggest any other element
> types where this behaviour might be necessary?

Yes, all "inline" elements. These are enumerated in the HTML 4 DTDs as the
following:

(strict)
TT | I | B | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP | KBD | VAR |
CITE | ABBR | ACRONYM | A | IMG | OBJECT | BR | SCRIPT | MAP | Q | SUB |
SUP
| SPAN | BDO | INPUT | SELECT | TEXTAREA | LABEL | BUTTON

(transitional)
TT | I | B | U | S | STRIKE | BIG | SMALL | EM | STRONG | DFN | CODE | SAMP
| KBD | VAR | CITE | ABBR | ACRONYM | A | IMG | APPLET | OBJECT | FONT |
BASEFONT | BR | SCRIPT | MAP | Q | SUB | SUP | SPAN | BDO | IFRAME | INPUT
|
SELECT | TEXTAREA | LABEL | BUTTON

I believe a clause should be included in a future version of the XSLT spec:
"When emitting a result tree as HTML, whitespace should never be added
inside inline elements."

Example:

What would normally be emitted as unindented XML like this:
<p><a href="foo"><img src="bar"/></a><br/>some text</p>

...could be emitted as indented HTML like this:
<p>
<a href="foo"><img src="bar"/></a><br/>some text
</p>


The reason why this rule is needed is because if whitespace is added, it
and
any adjacent whitespace is interpreted as a single "word separator"
relative
to adjacent text. The browser is supposed to render this separator in a
manner apporpriate to the language script being used, which isn't something
that is always predictable. In the Latin-based languages, the word
separator
is a breaking space.

In the case of inline images, applets and objects, you end up with the
image, applet or object being equivalent to some text, with the bottom edge
aligned along the baseline of adjacent text, as per the spec. This is
normally desirable behavior, but can be problematic if you are trying to
stack images on top of each other. The space allotted for descending
characters and the space between the bottom edge of descenders and the top
edge of the next row of text is often undesirable.

I made an example of this at http://www.skew.org/xml/misc_demos/whitespace/
and reported it to James Clark as an argument for changing the behavior of
XT's HTMLOutputHandler. He gave me a simple "thanks" for the info, but the
problem has yet to be resolved.

In the mean time, I've modified HTMLOutputHandler.java with an ugly
workaround, removing 'br' from the list of blockElements (which seems to be
an error anyway). This of course doesn't resolve every situation, but was
enough for my purposes, for now.


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list




Mime
View raw message