Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id CFB33200C30 for ; Tue, 7 Mar 2017 14:19:43 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id CE4C0160B74; Tue, 7 Mar 2017 13:19:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 24D70160B68 for ; Tue, 7 Mar 2017 14:19:43 +0100 (CET) Received: (qmail 78141 invoked by uid 500); 7 Mar 2017 13:19:41 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 77823 invoked by uid 99); 7 Mar 2017 13:19:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Mar 2017 13:19:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A34BDC120E for ; Tue, 7 Mar 2017 13:19:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.452 X-Spam-Level: * X-Spam-Status: No, score=1.452 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id eR9-gXekRbHq for ; Tue, 7 Mar 2017 13:19:39 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 77ADE5FC06 for ; Tue, 7 Mar 2017 13:19:39 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7E888E00D9 for ; Tue, 7 Mar 2017 13:19:38 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3D3C624168 for ; Tue, 7 Mar 2017 13:19:38 +0000 (UTC) Date: Tue, 7 Mar 2017 13:19:38 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LANG-1300) Clarify or improve behaviour of int-based methods in StringUtils MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 07 Mar 2017 13:19:44 -0000 [ https://issues.apache.org/jira/browse/LANG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899421#comment-15899421 ] ASF GitHub Bot commented on LANG-1300: -------------------------------------- Github user chtompki commented on the issue: https://github.com/apache/commons-lang/pull/251 @dmjones500 - no worries on the being busy, we all end up there for time to time... :-) @dmjones500 has an interesting point. The problem seems to lie with the number of "Supplementary Code Points" preceding the *findable* `searchChar` that have been previously split into their complementary surrogate pairs. You may need to consider using `Character.isSurrogate(char ch)` as well as `Character.isSurrogatePair(char high, char low)` for all characters preceding our *findable* code point. Granted, that adds an *O(n)* multiplier on our method's efficiency pushing us to *O(n2)*. It feels like only then can we be absolutely certain that we are not over counting using *code units* as opposed to *code points*. If indeed we do move this direction, we should be quite clear, in the javadoc, that there is a notable performance reduction when operating outside the "Basic Multilingual Plane" (ref. [Oracle's Character documentation](https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#supplementary)). @PascalSchumacher - you have any thoughts here as well? > Clarify or improve behaviour of int-based methods in StringUtils > ---------------------------------------------------------------- > > Key: LANG-1300 > URL: https://issues.apache.org/jira/browse/LANG-1300 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* > Affects Versions: 3.5 > Reporter: Duncan Jones > Priority: Minor > Fix For: Discussion > > > The following methods use an {{int}} to represent a search character: > {code:java} > boolean contains(final CharSequence seq, final int searchChar) > int indexOf(final CharSequence seq, final int searchChar) > int indexOf(final CharSequence seq, final int searchChar, final int startPos) > int lastIndexOf(final CharSequence seq, final int searchChar) > int lastIndexOf(final CharSequence seq, final int searchChar, final int startPos) > {code} > When I see an {{int}} representing a character, I tend to assume the method can handle supplementary characters. However, the current behaviour of these methods depends upon whether the {{CharSequence}} is a {{String}} or not. > {code:java} > StringBuilder builder = new StringBuilder(); > builder.appendCodePoint(0x2070E); > System.out.println(StringUtils.lastIndexOf(builder, 0x2070E)); // -1 > System.out.println(StringUtils.lastIndexOf(builder.toString(), 0x2070E)); // 0 > {code} > The Javadoc for these methods are ambiguous on this point, stating: > {quote} > This method uses {{String.lastIndexOf(int)}} if possible. > {quote} > I think we should consider updating the {{CharSequenceUtils}} methods used by this class to convert all {{CharSequence}} parameters to strings, enabling full code point support. The docs could be updated to make this crystal clear. > There is a question of whether this breaks backwards compatibility. -- This message was sent by Atlassian JIRA (v6.3.15#6346)