httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From greg wm <apa...@nvpf.org>
Subject [users@httpd] meta http-equiv useless??
Date Sat, 20 Aug 2005 20:29:59 GMT
hi folks,

i've landed in a character set mire, i need help from someone who knows..

i used wget to copy the entire http://nonviolentpeaceforce.org site to
http://nvpf.org/np.  the former is in m$ asp, the latter captured as html.

for example, http://nonviolentpeaceforce.org/spanish/welcome.asp was
captured to http://nvpf.org/np/spanish/welcome.asp.html

as you can see, the capture is mostly fine, including spanish characters
in the text (eg año), however the spanish characters in the menus didn't
do quite so well (eg Misi?n)

in the file año appears as a&ntilde;o which is apparently "good", but
Misi?n appears as Misión, which is apparently "bad".

first question:  why is that bad?

if i tell galeon, instead of automatic encoding, use western iso-8859-1,
then, presto, the page appears nicely.  but i don't have to do that to 
see the original, nor do i have to do that for anybody else's pages, and 
of course i can't expect our audience to go and fiddle with that in 
their browsers.

second question:  why doesn't the meta http-equiv header do anything?

right after the title the file says <meta http-equiv="Content-Type" 
content="text/html; charset=iso-8859-1">.  why isn't that good enough? 
why does it make no difference at all what i change it to?  i tried 
utf-8, Utf-8, UTF-8, Windows-1252, none have any effect tho i can see 
them if i tell my browser to view source.

third question:  is there some other setting in apache i should tweak?

from my blithely naive viewpoint, apache should be able to spit out 
those &whatever; escapes in place of the corresponding characters found 
in my html.  is that feature hiding in there somewhere?

fourth question:  can wget be tweaked to do better?

i think those menus were rendered out of some .asp database or
whatever, differently than the rest of the text of the page.  but so 
what?  why didn't wget capture something identical to what my
browser shows?

the command i ran was
wget -ENKkrl19 -nH -w2 -owget.log http://nonviolentpeaceforce.org

i also tried prepending LC_ALL=en_US to that, in hopes of avoiding the 
utf-8 default.  the result was different, but similarly dissapointing: 
http://nvpf.org/np/t/spanish/welcome.asp.html

well whatever, thunk i, no problem, i'll just find and replace.  well 
ha.  i haven't yet managed to craft sed to capture the buggers!  it's 
all making me feel dang defeated..

any help?

tia,
greg

Greg Whitley Mott
IT Coordinator
NonviolentPeaceforce.org

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message