cxf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aki Yoshida <>
Subject Re: FileUtils.getStringFromFile issue when using XML
Date Wed, 04 May 2011 08:17:24 GMT
Hi Tom,
I think the wrong thing about this method is that it adds an extra
space at the beginning. If the file content is an XML and it starts
with the xml declaration, there will be an extra space in front of the
declaration that violates the well-formdness.

You can create a jira issue for this particular bug. But this will not
really help your in the long run. I will explain the reason below.

As I understand your use case, you want to use this method for reading
an XML file and creating its java string representation in your
application.  As I see this method, it doesn't look like it was really
meant to be used for such purposes. Furthermore, it seems that this
class is only used in some unit test classes for performing a simple
content comparison.

For your particular use case, you need to take care of the character
encoding and possibly the newline handling. This FileUtil's method
ignores the encoding of the file.  If the file is using the utf-8
encoding, you need to read the stream and covert it into a java String
using the utf-8 encoding. If it is in some other encoding like utf-16,
iso-8859-1, etc, you need to use that encoding for conversion.
Otherwise, you will have a corrupted String for some characters.
Regarding the newline handling, this method currently removes all the
CR/LFs. This is probably okay for the existing test use cases, but for
your use case, you may want to either preserve the new line characters
or to normalize them using the standard XML rule. So, there will be
some other issues you will encounter if you use this simple method.

Therefore, I would recommed you not to use this FileUtil's method and
instead use an alternative approach using the xml parser to convert a
file for further processing (e.g., using InputSource to work on the
Source or XMLUtils.parse() to work on the Document).

Regards, Aki

2011/5/3 Tom Eastmond <>:
> That would be great to get this fixed - should I create a defect? I'd
> also love to not have it replace a single space with 2 spaces since
> that has caught me by surprise in my testing as well. Let me know what
> you'd like me to do.
> Thanks again,
> Tom Eastmond
> On Tue, May 3, 2011 at 6:19 AM, Aki Yoshida <> wrote:
>> Sorry,
>> I realized this method has actually nothing to do with XML.
>> please ignore my comments on XML normalization.
>> regards, aki
>> 2011/5/3 Aki Yoshida <>:
>>> Hi,
>>> you are right. The normalizeCRLF() method should not add an extra
>>> space at the begining. We can fix this particular issue.
>>> But there is one open question, as the exact purpose (use case) of
>>> this method is not clear to me. Why do we need this normalization
>>> method that just removes all the CRs and LFs and replace each
>>> space/tab character with a single space and this method is
>>> automatically called in FileUtils.getStringFromFile()?
>>> Does someone else wants to have other normalization options such as
>>> doing the standard xml white space "ignore" handling or the
>>> end-of-line handling (i.e., replacing each CRLF pair to a single LF)?
>>> Regards, aki
>>> 2011/5/2 Tom Eastmond <>:
>>>> I was using the FileUtils.getStringFromFile() method for some Camel
>>>> testing and was receiving a SAXParseException: The processing
>>>> instruction target matching "[xX][mM][lL]" is not allowed.].
>>>> It turns out that this was due to the was due to the
>>>> FileUtils.normalizeCRLF() method which replaces whitespace characters
>>>> (\s) with two spaces. This method appends leading spaces to the
>>>> contents (before the <?xml version="1.0" encoding="UTF-8"?> in this
>>>> case) which chokes the XML parser. Would it be feasible to forgo the
>>>> leading spaces at the start of a file in order to avoid this issue?
>>>> I'd be happy to submit a test case/patch if this seems like a valid
>>>> bug/fix. Please let me know if I should use another forum for this
>>>> request.
>>>> Thanks for the excellent work,
>>>> Tom Eastmond

View raw message