abdera-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Snell <jasn...@gmail.com>
Subject Re: Bug with HTML and TEXT content types on org.apache.abdera.model.Content?
Date Wed, 26 Jan 2011 16:51:15 GMT
Abdera uses the available, configured JAXP parser. Look in the Parser module
for the code that interfaces with the underlying parser.

On Wed, Jan 26, 2011 at 7:37 AM, Rick Meyer <Rick@gatherinc.com> wrote:

> Hi James,
>
> I was wondering if you could give me some help locating where exactly in
> the
> Abdera code the XML parser is being used to HTML encode the content.
> Also, which XML parser is Abdera using by default?
>
> I'm hoping to be able to either configure the existing XML parser to encode
> the > character too, or if necessary swap it out for another one that will.
> I'm hardly an expert in this stuff, but I would think that XWork2 would be
> able to handle this. I don't see that jar in the Abdera dependencies
> directory though, so I guess that is not being used here.
>
> Thanks,
>
> Rick
>
>
> On 12/1/10 12:39 PM, "James Snell" <jasnell@gmail.com> wrote:
>
> > While the &lt;p> encoding is annoying, it is valid. the > character does
> not
> > need to be escaped. Nevertheless, the encoding for this is actually
> handled
> > by the underlying XML parser/serializer and not Abdera itself.
> >
> > On Tue, Nov 30, 2010 at 2:32 PM, Rick Meyer <Rick@gatherinc.com> wrote:
> >
> >> We are using the Abdera client software to transfer html documents to a
> >> client¹s server.
> >>
> >> In creating a Content object I have attempted to set the content type to
> >> both TEXT and HTML and have run into an issue with each.
> >>
> >> When I set the content type to HTML only the Œ<Œ char of the include
> html
> >> ends up being HTML encoded, so <p> ends up like this &lt;p>
> >> It should be encode like this though &lt;p&gt;
> >>
> >> Actually when I set the content type to TEXT I get the exact same
> behavior.
> >> So if the text includes <p> what ends up being sent out is &lt;p>
> >>
> >> Now if I HTML encode the content myself, then the & character ends up
> being
> >> double encoded. So what I end up with is &amp;lt;p&amp;gt;
> >> It does this if I set the Content objects content type to HTML or TEXT.
> >>
> >> I would expect the this last case to occur with HTML since that should
> be
> >> HTML encoding the data anyways, but not for TEXT.
> >>
> >> I started using the latest release version of Abdera (1.1) and have now
> >> downloaded the latest source and built that myself and both versions
> have
> >> the same behavior.
> >>
> >> Is it possible to resolve this issue immediately? Otherwise we may have
> to
> >> scrap Abdera and find another solution.
> >>
> >> Here is an example of what was being sent:
> >>
> >> <entry
> >> xmlns="http://www.w3.org/2005/Atom
> >> "><id>281474978492700</id><author><name>Br
> >> enda Daverin</name></author><title type="text">US Indicts
11 German and
> >> Chinese Executives for Honey Smuggling</title><content
> >> type="text">&lt;p>For
> >> many people with psoriasis, finding safe and effective treatments can be
> an
> >> ever-moving target. There's no cure or universal fix, people respond
> >> differently to treatment options, and even when you find a medication -
> or
> >> a
> >> combination of them - that works, it may only be effective for a period
> of
> >> time or may need to be stopped to avoid potentially damaging side
> >> effects.&lt;/p>&lt;p>"There are a lot of treatments out there
and they
> are
> >> quite effective, but often they stop being effective," says Dr. Mark
> >> Lebwohl, chair of the department of dermatology at Mount Sinai Medical
> >> Center in New York City. "There isn't one treatment over a lifetime,
> >> necessarily."&lt;/p></content><category /></entry>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message