abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Meyer <R...@gatherinc.com>
Subject Re: Bug with HTML and TEXT content types on org.apache.abdera.model.Content?
Date Wed, 26 Jan 2011 15:37:44 GMT
Hi James,

I was wondering if you could give me some help locating where exactly in the
Abdera code the XML parser is being used to HTML encode the content.
Also, which XML parser is Abdera using by default?

I'm hoping to be able to either configure the existing XML parser to encode
the > character too, or if necessary swap it out for another one that will.
I'm hardly an expert in this stuff, but I would think that XWork2 would be
able to handle this. I don't see that jar in the Abdera dependencies
directory though, so I guess that is not being used here.



On 12/1/10 12:39 PM, "James Snell" <jasnell@gmail.com> wrote:

> While the &lt;p> encoding is annoying, it is valid. the > character does not
> need to be escaped. Nevertheless, the encoding for this is actually handled
> by the underlying XML parser/serializer and not Abdera itself.
> On Tue, Nov 30, 2010 at 2:32 PM, Rick Meyer <Rick@gatherinc.com> wrote:
>> We are using the Abdera client software to transfer html documents to a
>> client¹s server.
>> In creating a Content object I have attempted to set the content type to
>> both TEXT and HTML and have run into an issue with each.
>> When I set the content type to HTML only the Œ<Œ char of the include html
>> ends up being HTML encoded, so <p> ends up like this &lt;p>
>> It should be encode like this though &lt;p&gt;
>> Actually when I set the content type to TEXT I get the exact same behavior.
>> So if the text includes <p> what ends up being sent out is &lt;p>
>> Now if I HTML encode the content myself, then the & character ends up being
>> double encoded. So what I end up with is &amp;lt;p&amp;gt;
>> It does this if I set the Content objects content type to HTML or TEXT.
>> I would expect the this last case to occur with HTML since that should be
>> HTML encoding the data anyways, but not for TEXT.
>> I started using the latest release version of Abdera (1.1) and have now
>> downloaded the latest source and built that myself and both versions have
>> the same behavior.
>> Is it possible to resolve this issue immediately? Otherwise we may have to
>> scrap Abdera and find another solution.
>> Here is an example of what was being sent:
>> <entry
>> xmlns="http://www.w3.org/2005/Atom
>> "><id>281474978492700</id><author><name>Br
>> enda Daverin</name></author><title type="text">US Indicts 11 German
>> Chinese Executives for Honey Smuggling</title><content
>> type="text">&lt;p>For
>> many people with psoriasis, finding safe and effective treatments can be an
>> ever-moving target. There's no cure or universal fix, people respond
>> differently to treatment options, and even when you find a medication - or
>> a
>> combination of them - that works, it may only be effective for a period of
>> time or may need to be stopped to avoid potentially damaging side
>> effects.&lt;/p>&lt;p>"There are a lot of treatments out there and they
>> quite effective, but often they stop being effective," says Dr. Mark
>> Lebwohl, chair of the department of dermatology at Mount Sinai Medical
>> Center in New York City. "There isn't one treatment over a lifetime,
>> necessarily."&lt;/p></content><category /></entry>

View raw message