xmlgraphics-fop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Xmlgraphics-fop Wiki] Update of "LineLayout/WhitespaceHandling" by ManuelMall
Date Tue, 18 Oct 2005 09:08:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Xmlgraphics-fop Wiki" for change notification.

The following page has been changed by ManuelMall:

The comment on the change is:
Initial discussion page on WhitespaceHandling

New page:
Part of the formatting process is the correct handling of white space. The rules and properties
governing this process have been rewritten in the [http://www.w3.org/TR/xsl11/ XSL-FO 1.1
Working Draft] in [http://lists.w3.org/Archives/Public/xsl-editors/2003JulSep/0012 response
to comments by Karen Lease]. This rewrite has attempted to clarify white space handling and
moved most of the white space handling processes from the refinement step to the area creation
step and in particular to the line and inline building processes. However, the changes don't
seem to intend to trigger changes to the XSL-FO 1.0 white space handling and are merely a
clarification. In the following white space handling will therefore be discussed in terms
of the XSL-FO 1.1 WD under the assumption that the outcomes are valid for XSL-FO 1.0.

= Properties and Rules in the XSL-FO 1.1 WD related to white space handling =
The following properties control white space handling:
 * [http://www.w3.org/TR/xsl11/#linefeed-treatment 7.16.7 linefeed-treatment]
 * [http://www.w3.org/TR/xsl11/#white-space-treatment 7.16.8 white-space-treatment]
 * [http://www.w3.org/TR/xsl11/#white-space-collapse 7.16.12 white-space-collapse]
 * [http://www.w3.org/TR/xsl11/#suppress-at-line-break 7.17.3 suppress-at-line-break]

An additional important description of parts of the white space handling process can be found
in [http://www.w3.org/TR/xsl11/#area-linebuild 4.7.2 Line-building].

= What is white space? =
XSL-FO defines white space as any character whose Unicode value is classified as white space
in XML. This means only U+0020 (space), U+0009 (tab), U+000D (carriage return) and U+000A
(linefeed) are white space characters in XSL-FO. It should be noted that therefore there is
a big difference between the set of word breaking characters, especially taking non Western
scripts into account, and white space characters.

/!\ The spec seems to imply under rules 5. and 6. in 4.7.2 that only white space glyph areas
can be deleted. Any other  word breaking characters are not removed around line breaks. This
seems interesting.

= Processing model =
One problem in understanding XSL-FO white space handling is to derive a suitable processing
model which matches the intention of the specification. The specification itself is in parts
contradictory what processing takes place at which stage in the XSL-FO model. Here is my (ManuelMall)
first attempt in describing a conforming(?) white space handling process.

== Step 1. Refinement: linefeed-treatment ==
All fo:character objects which have a character property value of U+000A are dealt with according
to the setting of the linefeed-treatment property. This is fairly straight forward and involves
either preservation or deletion of the fo:character object or replacement of its character
property value with a new value of U+0020 (space) or  U+200B (zero width space).

== Step 2. Refinement: white-space-collapse ==
The processing model presented here deviates from the text in the specification (not necessarily
the intent though) as the specification makes white-space-collapse an area tree construction
activity. However, the remainder of the description of the [http://www.w3.org/TR/xsl11/#white-space-collapse
white-space-collapse] property refers only fo:character objects and their direct siblings
in the fo tree with certain character property values. It also refers directly to fo:character
objects with a character property value of U+000A (linefeed) but does not refer to line breaks.
All this leads to the conclusion that collapsing white space is really a refinement activity.
The actual processing is again straight forward: If the property value is "false" just skip
the step. If its "true" for any sequence of direct sibling fo:character objects whose character
property value is an XML white space value and is not U+000A retain only the first fo:character
object and delete all others. An
 y remaining non linefeed white space fo:character objects which are adjacent to a U+000A
(linefeed) fo:character are also deleted.

=== Issues ===
/!\ The spec does not mention replacement of any white space that is not a U+0020 (space)
or U+000A (linefeed) with a space. This seems to indicate that U+0009 (tab) and U+000D (carriage
return) are left unchanged in the fo tree. The current FOP version does replace those with
a space. That seems reasonable but is it compliant?

/!\ The spec under white-space-collapse refers to removing adjacent white space next to U+000A
(linefeed) characters. This is different to removing white space around line breaks as the
formatter will generate line breaks even without a linefeed being present. Is that intentional
or just an unfortunate wording in the spec?

/!\ The spec does not put any constraint on collapsing white space with different properties.
   <fo:character font-size="80pt" character=" "/>
   <fo:character border="2pt solid red" font-size="10pt" character=" "/>
would be collapsed leaving only the initial space character. Is that intentional? This should
be contrasted with the description in 4.7.2 of glyph merging/replacement which clearly states
that only glyphs with matching properties can be merged/substituted.

== Step 3. line-building: white-space-treatment and suppress-at-linebreak ==
white-space-treatment and the related suppress-at-linebreak properties are dealt with during
[http://www.w3.org/TR/xsl11/#area-linebuild 4.7.2 line-building].

/!\ While there are contradictions in the spec in that both [http://www.w3.org/TR/xsl11/#white-space-treatment
7.16.8 white-space-treatment] and [http://www.w3.org/TR/xsl11/#white-space-collapse 7.16.12
white-space-collapse] still mention refinement as the stage in which these properties are
dealt with, these are most likely editorial mistakes.

While the white-space-treatment process described in 4.7.2 item 6. is quite clear there are
some challenges left.

=== Issues ===

/!\ As for white-space-collapse the spec does not put any constraint on deleting white space
under white-space-treatment with different properties. Intentional or not?

/!\ When deleting white space with white-space-treatment being "ignore" should we remember
that there was white space and keep that as a legal line break point, i.e. treat it like "treat-as-zero-width-space"?

/!\ For white-space-treatment values being "ignore-if-...-linefeed" (which actually is a misnomer
and should be "ignore-if-...-linebreak") and "preserve" Knuth sequences need to be constructed
which enforce those values. This could be especially complex in the case of white-space-collapse="false".

/!\ The interaction of white-space-treatment values "ignore-if-...-linefeed" and "preserve"
in conjunction with white-space-collapse="false" and text-align="justify" on the Knuth sequences
and calculated word-spacing values needs further study.

= Examples =
In the examples that follow space are represented by '.'.

Example 1: Simple text - all properties defaulting
After step 1 (linefeed-treatment) we have:
After step 2 (white-space-collapse) we have:
After step 3 (white-space-treatment and line-building) we have:

Example 2: Simple nested block - all properties defaulting
..<fo:block background-color="green">
After step 1 (linefeed-treatment) we have (Note: the linefeeds in the block below are for
readability only!):
<fo:block>...<fo:block background-color="green">....Green.background...text...</fo:block>....This.is..some...
After step 2 (white-space-collapse) we have (Note: the linefeeds in the block below are for
readability only!):
<fo:block>.<fo:block background-color="green">.Green.background.text.</fo:block>.This.is.some.
After step 3 (white-space-treatment and line-building) we have:
/!\ After step 2 we have a sequence {{{"<fo:block>.<fo:block>"}}}. It is unclear
to me where in the spec it can be derived from that this does not generate an empty line.
But everybody seems to be dropping this space/line.

'''This is WIP and more is to come'''

To unsubscribe, e-mail: fop-commits-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-commits-help@xmlgraphics.apache.org

View raw message