Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 460A4D11D for ; Sat, 24 Nov 2012 23:29:00 +0000 (UTC) Received: (qmail 77894 invoked by uid 500); 24 Nov 2012 23:28:59 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 77773 invoked by uid 500); 24 Nov 2012 23:28:59 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 77640 invoked by uid 99); 24 Nov 2012 23:28:59 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Nov 2012 23:28:59 +0000 Date: Sat, 24 Nov 2012 23:28:59 +0000 (UTC) From: "Michael Knapp (JIRA)" To: issues@commons.apache.org Message-ID: <881518492.20846.1353799739411.JavaMail.jiratomcat@arcas> In-Reply-To: <627548337.20059.1353728338249.JavaMail.jiratomcat@arcas> Subject: [jira] [Comment Edited] (LANG-860) String split with an escape pattern MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LANG-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503436#comment-13503436 ] Michael Knapp edited comment on LANG-860 at 11/24/12 11:28 PM: --------------------------------------------------------------- I beg to differ, commons-csv assumes there can be an escape character, my code assumes there can be an escape pattern. My code handles a much more broad range of problems than CSV. For example, what if you want to get all the parenthesized text out of a document? commons-csv cannot do that because '(' and ')' are different characters. Commons-csv offers no method to retain delimiters that you split on, my code does. Let's say you split on the pattern of open and closed parentheses: no existing split function in commons-lang, and no function in commons-csv, is able to retain the text that matched your regular expression delimiter, but my code does. The code I wrote does not replace commons-csv, nor does it try. Commons-csv handles comments, empty lines, trimming text, and a whole lot more which is out of the scope of my code. Also, if you expect anybody to use commons-csv, you should really put it on the central maven repository, and document it a little more. was (Author: msknapp): I beg to differ, commons-csv assumes there can be an escape character, my code assumes there can be an escape pattern. My code handles a much more broad range of problems than CSV. For example, what if you want to get all the parenthesized text out of a document? commons-csv cannot do that because '(' and ')' are different characters. Commons-csv offers no method to retain delimiters that you split on, my code does. Let's say you split on the pattern of open and closed parentheses: no existing split function in commons-lang, and no function in commons-csv, is able to retain the text that matched your delimiter, but my code does. The code I wrote does not replace commons-csv, nor does it try. Commons-csv handles comments, empty lines, trimming text, and a whole lot more which is out of the scope of my code. Also, if you expect anybody to use commons-csv, you should really put it on the central maven repository, and document it a little more. > String split with an escape pattern > ----------------------------------- > > Key: LANG-860 > URL: https://issues.apache.org/jira/browse/LANG-860 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* > Reporter: Michael Knapp > Priority: Minor > Labels: patch, split > Attachments: StringUtilsSplitEscapingly.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > Often times there are strings which are delimited, but certain patterns can escape the delimiter. For example, quotes are used in CSV to escape a comma delimiter. I have written a couple methods for StringUtils that split strings while considering the possibility of an escape pattern. For example, when given "a,\"b,c\",c", it will produce {"a","\"b,c\"","c"}. In my code, the delimiter can be a string, and it can be escaped by any regular expression pattern. Unit tests are already written and passing. > I plan to attach the patch for this once the ticket is created. I just need a committer to review the patch, approve, and commit it for me. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira