forrest-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <thors...@apache.org>
Subject Re: character entities
Date Sun, 22 Jan 2006 12:35:06 GMT
El sáb, 21-01-2006 a las 22:41 +0800, Gav.... escribió:
> I wrote earlier :-
> 
> | P.S-
> |
> | Creating an empty hook has the side effect of a self closing div - 
> obviously
> | inlvalid and messes
> | up the rest of the page.
> |
> | e.g.
> |
> | <forrest:hook name="headlines"></forrest:hook>
> | creates
> | <div id="headlines />
> | instead of
> | <div id="headlines"></div>
> |
> 
> This made me have a look around, and as a side-effect found an old
> thread regarding this back in July 2005, just before my time I think.
> 
> The discussion also brought up &#160; and its possible side effects
> of creating  .

Yeah, I am running into the same problem ATM. You can see it online at 
http://lenya.zones.apache.org/ "Today:Â..." (I am looking right now to
fix that, so the site may be fixed if you visit it)

On my local machine that does not happen. I have run into a similar
problem with cocoon on a job I did a while ago. The problem was lying in
the server configuration, now I just looked on our zones server:
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
-bash-3.00$ locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

The locale is not UTF-8!

Doing "locale" on my machine gives:
LANG=es_ES.UTF-8
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=

That is UTF-8!

The solution back on the job was setting the language to de_DE.UTF-8
before starting the tomcat server.

> 
> I am seeing  a lot lately and so think the problem is still not solved.

Yeah, we need to find the root cause of this problem. I guess it is
lying in the "locale".

> For instance it is the result of siteinfo-last-published.ft.
> 
> Last Published:Â 01/21/2006 17:53:37
> 
> This line is created from the contract here :-
> 
> <xsl:template name="siteinfo-last-published-body">
>         <script type="text/javascript">document.write("<i18n:text >Last 
> Published:</i18n:text>&#160;" + document.lastModified);</script>
>       </xsl:template>
> 
> I have a theory.
> 
> in the code the &#160; is right next to a " (quote). The entity number for " 
> (quote) is &#34;
> 
> I had a crazy thought that maybe they were adding together somehow. 
> (160+34=194)
> 
> &#160; + &#34; = &#194;
> 
> Guess what &#194; is equal to , yup :-
> 
> Â

Jeje, nice theory, but I have an example where you do not combine &#160;
with " and still the described behavior can be seen.

        <forrest:hook class="breadtrail">
          <forrest:contract name="genericMarkup">
            <forrest:properties contract="genericMarkup">
              <forrest:property name="genericMarkup">
                <strong>&#160;</strong>
              </forrest:property>
            </forrest:properties>
          </forrest:contract>
        </forrest:hook>


> 
> Anyway, in the example contract code above, removing &#160; and putting a 
> real space in there cures it and the space
> is preserved no problem as it is enclosed between the quotes.
> 
> I copied the contract to my /pili/html/ directory and it now renders 
> correctly.
> 
> Have I missed something here or was it that simple ?
> 

Actually regarding http://www.html-world.de/program/html_sz.php 
Unicode: &nbsp;
XML: &#160;
Result: space

We are using the right code for spaces. I guess you need to set the
locale to UTF-8 on the server (before starting httpd), since forrest is
rendering it on lenya.zones just fine but the httpd is delivering the
content with "Â".

More information about utf-8 and unicode can be found on [1]. How to set
it in Gentoo [2] is in spanish only but I reckon you still can get the
idea following the coding examples. Finally [3] a smaller howto in
english.

> Still looking into the empty <div> problem but again may be &#160; related

> I'm not sure.

No, IMO not, see above.

> There was talk in the archives it was cured, but seems to be back.
> 

forrest-trunk/whiteboard/plugins/org.apache.forrest.plugin.internal.structurer/resources/stylesheets/hooksMatcher.xsl
...
<xsl:if test="@nbsp='true'">&#160;</xsl:if>
...

Meaning 
<forrest:hook name="headlines"></forrest:hook>
creates
<div id="headlines />

and <forrest:hook name="headlines" nbsp="true"></forrest:hook>
gives
<div id="headlines"> </div>

We need to document this in the howto.

HTH
salu2

[1]http://www.cl.cam.ac.uk/~mgk25/unicode.html
[2]http://es.gentoo-wiki.com/Unicode-UTF-8
[3]http://www.linux.com/howtos/Indic-Fonts-HOWTO/locale.shtml
-- 
thorsten

"Together we stand, divided we fall!" 
Hey you (Pink Floyd)


Mime
View raw message