commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chtompki <...@git.apache.org>
Subject [GitHub] commons-lang issue #251: LANG-1300: Change CharSequenceUtils to support supp...
Date Tue, 07 Mar 2017 13:18:48 GMT
Github user chtompki commented on the issue:

    https://github.com/apache/commons-lang/pull/251
  
    @dmjones500 - no worries on the being busy, we all end up there for time to time... :-)

    
    @dmjones500 has an interesting point. The problem seems to lie with the number of "Supplementary
Code Points" preceding the *findable* `searchChar` that have been previously split into their
complementary surrogate pairs.  
    
    You may need to consider using `Character.isSurrogate(char ch)` as well as `Character.isSurrogatePair(char
high, char low)` for all characters preceding our *findable* code point. Granted, that adds
an *O(n)* multiplier on our method's efficiency pushing us to *O(n<sup>2</sup>)*.
It feels like only then can we be absolutely certain that we are not over counting using *code
units* as opposed to *code points*. 
    
    If indeed we do move this direction, we should be quite clear, in the javadoc, that there
is a notable performance reduction when operating outside the "Basic Multilingual Plane" (ref.
[Oracle's Character documentation](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#supplementary)).
    
    @PascalSchumacher - you have any thoughts here as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message