commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Tompkins (JIRA)" <>
Subject [jira] [Commented] (TEXT-40) Escape HTML characters only once
Date Sun, 19 Feb 2017 02:31:44 GMT


Rob Tompkins commented on TEXT-40:

I like the technique there. Its simplicity carries value. The interesting ideas come out of
what to do with a string that's been through the non-singly escaped method several times,
call it having been "escaped" some arbitrary finite number of times. Personally I like the
idea of having a escape once method, but I just want to make sure we consider all of the implications

The thoughts that crossed my mind today follow (taken from my dev list email,

In preparation for the 1.0 release, I think we should address Sebb's concern in
TEXT-40 about the attempt to create "idempotent" string escape methods. By
idempotent I mean someMethod("some string") =
someMethod(someMethod(someMethod(...someMethod("some string")))), a single
application of a method is equal to any number of the applications of the method
on the same input.

Below I lay out a mechanism by which it is possible to write such methods, but I
don’t know the value in writing such methods. I'm merely expressing that
idempotency is a possibility.

For string "un-escaping", I believe that we can write a method that, indeed, is
idempotent by simply running the un-escape method the finite number of
un-escapings to get to the point at which the string remains unchanged between
applications of the un-escaping method. (I believe that I can write a proof that
all un-escape methods have such a point, if that is needed for the sake of

If indeed we can create an idempotent un-escape method, then we can simply take
that method run it, and then run the escaping method one time. If we always
completely unescape and then escape once then we do have an idempotent method. 
Such a method might not be all that valuable to the user though. Furthermore,
this just explains one way to create such an idempotent method. Whether or not
more or more valuable methods exists, would take some more though.

Anyone have any thoughts? My feeling is that it might be more effort than it's
worth to ensure that any string is only "singly encoded.” Further, we probably
should give a look at the “escape_once” methods in StringEsapeUtils.

> Escape HTML characters only once
> --------------------------------
>                 Key: TEXT-40
>                 URL:
>             Project: Commons Text
>          Issue Type: Improvement
>            Reporter: Sampanna Kahu
>            Assignee: Rob Tompkins
>            Priority: Minor
>              Labels: features, newbie
> If already escaped HTML characters are in the input test, they get escaped again using
> For example:
> If the input is:
> 100 kg & l t ; 1000kg <without the spaces>
> Then the output of escapeHtml4() becomes:
> 100kg & amp ; l t ; 1000kg <without the spaces>
> At my workplace, we felt the need for a method in StringEscapeUtils which does not escape
already escaped characters.
> I have attempted to create this method. Creating a pull request soon.

This message was sent by Atlassian JIRA

View raw message