commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duncan Jones <dun...@wortharead.com>
Subject Re: [text][lang] string escaping
Date Sat, 19 Nov 2016 16:08:37 GMT


> On 19 Nov 2016, at 15:38, Rob Tompkins <chtompki@gmail.com> wrote:
> 
> 
>> On Nov 19, 2016, at 6:33 AM, Benedikt Ritter <britter@apache.org> wrote:
>> 
>> Hello Gray,
>> 
>> Gary Gregory <garydgregory@gmail.com> schrieb am Sa., 19. Nov. 2016 um
>> 01:07 Uhr:
>> 
>>> Just a thought:
>>> 
>>> Does all the current (and future) string escaping code (XML, HTML, ...)
>>> really belong in [lang]? Would it be more natural to have it in [text]?
>>> 
>> 
>> My view on the whole think currently is, that we put stuff that is related
>> to strings in Lang. Code that works on texts should go to Text. To me a
>> text is more than just a string. A text contains works, that make up
>> sentences, which in turn build paragraphs.
>> 
>> Using this description, I'd argue that escaping belongs into lang and not
>> into text, because it works on individual characters rather than on texts.
> 
> I think this is a difficult distinction to draw because fundamentally anything that does
sufficient text processing necessarily operates on a character by character basis. I propose
below a distinction more along the lines of potential usage.
> 
>> 
>> But this would also raise the question if the various edit distance
>> algorithms works on texts or on strings. So maybe my distinction is not
>> good at all.
>> 
>> Do we need to better specify the scope of text?
> 
> I definitely agree with the sentiment that we should find a clear line of distinction
between lang and text with regards to strings. Some thoughts that spring to mind are more
in the terms of how the algorithms are to be used. 
> 
> So let’s consider the two extremes of the spectrum of string/word/text algorithms.
On one hand, we have utilities like “StringUtils.isBlank(String s)” which is ubiquitously
used in standard day to day and is a foundational extension of java. On the other hand, we
have algorithms like natural language processing or statistical processing of words for analysis
of biological sequences (two chapters in M. Lothaire’s “Applied Combinatorics on Words).
The extremes seem to point towards day-to-day usage in any variety of java applications, where
as the other extreme seems to point to an application that is specifically designed at string/word/text
processing. I don’t see folks in everyday usage wanting to find edit distance between two
strings unless they’re writing something specifically doing text processing or something
of that nature.
> 
> Now clearly the problem with this distinction is the amount of grey area that it leaves
in figuring out what goes where, so I don’t know if it’s the right way to go. It was just
the thought that came to mind.
> 
> Any thoughts out there?

I think you're on the right track here. Lang is supposed to plug the gaps in Java's core packages.
A certain amount of text manipulation is expected in many applications, but once we get into
the realms of statistical analysis or fuzzy comparison methods then we've moved beyond that.


Perhaps a tongue-in-cheek definition of "if you had to consult a book to write that, it belongs
in Text". 

Duncan

> 
> Cheers,
> -Rob
> 
>> 
>> Benedikt
>> 
>> 
>>> 
>>> Gary
>>> 
>>> --
>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <
>>> https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8
>>>> 
>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
>>> JUnit in Action, Second Edition
>>> <
>>> https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22
>>>> 
>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
>>> Spring Batch in Action
>>> <
>>> https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action
>>>> 
>>> <http:////
>>> ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message