xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksandr Kravets <akravets.w...@gmail.com>
Subject Re: carriage return in attribute
Date Mon, 02 Mar 2009 16:39:53 GMT
Thanks Michael,

I understand about XML rules for processing of carriage returns. I am
dealing with an XML document that in being imported into my application. I
am not sure if it has been serialized correctly or not, but if I read
through this document byte-by-byte I see carriage return (13) and newline
(10) as termination characters in an attribute that is a String. I know it's
probably wrong to put these characters in an attribute and this should have
been a value of the element inside a CDATA, but this is the document that I
need to work with.
So once I parse this document all CRLFs are converted to LFs and I am left
with a line with newlines which changes how this attribute is displayed -
string is displayed in line instead of having newlines visible.
Now, I guess I can read through the document before it is imported (without
parser) and replace all CRLFs with &#xA; to make it correct. However, this
would be ugly and I was wondering if there is an easier way to deal with
this.

Hope I am being clear in what I am trying to achieve.

thanks,
Alex

On Sat, Feb 28, 2009 at 10:53 AM, Michael Glavassevich
<mrglavas@ca.ibm.com>wrote:

> I'm not sure what you're asking for. Attribute value normalization [1] is
> part of the parsing process. It occurs before the data is presented to an
> application through any of the standard APIs.
>
> [1] http://www.w3.org/TR/2006/REC-xml-20060816/#AVNormalize
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
>
> Aleksandr Kravets <akravets.work@gmail.com> wrote on 02/27/2009 10:07:08
> AM:
>
>
> > Thanks.
> > Are there utilities in Xerces that allow carriage returns
> > normalization easier than let's say parsing the whole document and
> > doing it manually?
>
> > On Thu, Feb 26, 2009 at 6:39 PM, <keshlam@us.ibm.com> wrote:
> > Carriage return is ASCII 13, so &#13; or &xD; will represent that
> character.
> >
> > However, be sure you understand XML's rules for whitespace
> > normalization in attribute values. Depending on what you're trying
> > to do, you may want to replace that attribute with a child
> > element... or replace the offending character with some notation
> > that your application, rather than XML, will process appropriately.
> >
> > ______________________________________
> > "... Three things see no end: A loop with exit code done wrong,
> > A semaphore untested, And the change that comes along. ..."
> >  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (http://www.ovff.
> > org/pegasus/songs/threes-rev-11.html)
>

Mime
View raw message