xerces-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Paulinski" <scottp...@hotmail.com>
Subject RE: Xerces Internationalization
Date Thu, 01 Jan 1970 00:00:00 GMT

I tried different upper case for ISO-8859-1, but this didn't work.  This 
gave me the idea that other aliases for internal transcoders might work.  
After digging through Xerces code for a little while I found where they 
declare all the aliases for internal transcoders.  It looks like LATIN1 
works.  LATIN1 looks like it is just an alias for ISO-8859-1 so I have no 
idea why it works and ISO-8859-1 doesn't.  Thanks for your help.

Scott Paulinski

>From: "Arnold, Curt" <Curt.Arnold@hyprotech.com>
>Reply-To: xerces-c-dev@xml.apache.org
>To: "'xerces-c-dev@xml.apache.org'" <xerces-c-dev@xml.apache.org>
>Subject: RE: Xerces Internationalization
>Date: Fri, 17 Aug 2001 13:55:50 -0600
> > 1) Include no header in the XML file being read.  This results in
> > non-English characters being read in as a ? character.
>I'm surprised that you didn't get an encoding exception since ISO-8859-1 
>code points would rarely be legal UTF-8.
> >
> > 2) Including the header <?xml version="1.0"
> > encoding="iso-8859-1" ?>.  This
> > causes the file not to be read at all.  Looking at the Xerces
> > code I was
> > able to track down one of the problems to the way Xerces
> > detects codepages
> > in Win32TransService.cpp.  In the constructor it checks for
> > the codepages on
> > the machine by looking in the registry under
> > HKCR\MIME\Database\Codepage
> > (and Charset), which doesn't exist on a base Windows 95 system.
>There does seem to be a internal ISO-8859-1 transcoder, but it looks like 
>it might be sensitive to capitalization.  What happens if you use 
> >
> > I was able to add this set of registry keys by installing IE
> > 4.01, but the
> > iso-8859-1 encoding still doesn't work for non-English
> > characters.  In this
> > case Xerces ignores the entire file if it contains such characters.
> > Unfortunately, the 1252 codepage (which is what iso-8859-1
> > looks like it is
> > mapped to) appears to be the only one installed on this
> > version of Windows
> > 95.  The 1252 codepage is named "Western European (Windows)"
> > in the registry
> > which sounds like the character set I am looking for.
> > Looking at Xerces
> > documentation it looks like they support iso-8859-1 as "ISO
> > Latin 1" which
> > sounds promising as well.  So it looks like I am using the
> > proper codepage,
> > but it just isn't working for some reason.
>CP-1252 is ISO-8859-1 + plus a few additional characters between 0x82 and 
>0x8C and 0x91 and 0x9C and 0x9F.
> >
> > On a side note, I found that using iso-8859-3 (1254) does
> > allow Xerces to
> > use these non-English characters.  Though this encoding is
> > not installed on
> > these Windows 95 systems.  If anyone knows an easy way to
> > install this
> > encoding (without installing a whole application like IE)
> > that would be
> > helpful as well.
> >
> > Any help is greatly appreciated.
>There are also a few unnecessary dependencies on IE 4 components (urlmon 
>and wininet) in the COM wrapper.
>For equivalence with MSXML, the COM wrapper provides an XMLHttpRequest 
>object that is implemented using WININET.  Unfortunately, this causes 
>xml4com.dll not to load if IE4+ isn't present even if you
>weren't planning on using XMLHttpRequest.  I have a personal copy that has 
>rewritten XMLHttpRequest so that it dynamically loads WININET if and only 
>if you try to do something with XMLHttpRequest.
>Also, XMLDOMDocument makes calls to PathIsURL, PathIsRelative and 
>URLDownloadToCacheFile in urlmon.  PathIsURL and PathIsRelative can both be 
>trivially implemented locally.  For
>URLDownloadToCacheFile, my proxy will return the local file name if the URL 
>is a local file and dynamically load urlmon if the url is remote.  This at 
>least allows you to parse local files without
>having IE present.
>Since the COM wrapper is moderately comatose and Win95 without IE 4 even 
>more so, I haven't prep'd these changes for inclusion in the CVS.  However, 
>if you would like them as is, let me know.
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org

Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org

View raw message