xerces-j-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric J. Schwarzenbach" <Eric.Schwarzenb...@wrycan.com>
Subject Re: XMLSerializer Writer Constructor
Date Tue, 26 Jul 2005 16:26:10 GMT
This has been discussed in these lists before.

I agree with you that having the Writer constructor is a pitfall, since
it is normally the wrong thing to do. It's a particularly easy mistake
to make because developers who aren't familiar with encoding issues
(which is most developers) will almost always choose the Writer over the
Stream because intuitively that's the natural choice for character data.
I've seen *many* developers fall into this pitfall.

It can also produce particularly insidious and harmful bug since many
encodings are compatible in the basic character range so corruption of
the odd extended character may occur for a long time before someone
detects it (perhaps after the original source documents have been
written over.)

However there is one practical problem with deprecating this method. We
cannot assume the developer is serializing to something under his
control and that he has the option of constructing a Stream or a Writer.
He may he interfacing to another system which provides him only a Writer
to serialize to. OutputStreamWriter offers a bridge from OutputStreams
to Writers but as far as I know there is no bridge the other way.

It does at least deserve a big bold emphatic warning in the JavaDoc though.

The very same issue exists for input on parsing in the |InputSource
Reader constructor. Though I notice the SAX and DOM parser parse methods
don't offer a Reader version, so you have to take that extra step of
creating and ||InputSource
to find it, so it may be somewhat less of a pitfall for newbies. |


Stephen Kestle wrote:

> Stanimir Stamenkov wrote:
>> /Stephen Kestle/:
>>> Of course it is because Characters should not need to specify
>>> encoding, but xml IS NOT a character stream - it's an encoded
>>> character stream [...]
>> Don't want to spam the list but: XML _is_ a character stream. The
>> byte encoded representation is for transport means as most data in
>> the computer world is stored as sequences of bytes. Your sentence
>> seems to me like: "DOM is not an object, it is an 'encoded' XML stream".
> Ok - I sort of agree - the critical point I'm trying to make is that
> it's encoded (when it's valid).  In Java, chars and Strings [IMO]
> infer decoded (ie. screen printable) characters.
>>> Please stop the madness - just deprecate it and write a nice javadoc
>>> explaining why.  You'll probably end up saving 1000s of developer
>>> hours on that change alone.
>> You know, I've thought of that many times too, but what if I need to
>> write to a character buffer and not a byte one? I think it is just
>> problem of the less experienced developers which doesn't realize what
>> actually happens when encoding text to a byte stream and in
>> particular serializing XML.
> Well you can always wrap it!  Why make something hard to use for
> people who are learning in favour of what should be exceptional (and
> advanced!) usage.  But then, if you had to make a WriterOutputStream,
> you'd have to wonder why it hasn't been done before (as far as I can
> see).  If you're serializing an object, you're about to make it
> external to the system - and there's no good reason to use a Character
> stream for this (which will export in the default encoding, unless you
> make it a stream of some sort...).
> Most developers will expect it to return encoded data that is valid,
> never realising that their code is fragile and will break when someone
> puts some accidental value on a web screen.  I can't see how this
> isn't broken - the serializer slaps a header on it which claims it's
> encoded - but it isn't!
> When Date was found to be non-transportable, Sun deprecated a whole
> lot of stuff.  Please follow common java conventions (both Writer vs
> Stream, and deprecation) and do the same.
> In Summary: you are technically correct, but think about the social
> [developer] impact of maintaining an "I'm right" stance.  End
> developers surely can't realistically complain about having to make a
> non-standard WriterOutputStream for a non-standard operation.  Even
> the advanced users who would use this would have seen noob code that
> had to be cleaned up (as I have had to do)
> But I am interested to know if there is a use case for this.  Although
> I highly suspect that most use cases would be broken.
> Thanks
> Stephen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org

View raw message