Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C1EDA200B32 for ; Thu, 9 Jun 2016 03:39:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C05A8160A35; Thu, 9 Jun 2016 01:39:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 14D23160A2E for ; Thu, 9 Jun 2016 03:39:04 +0200 (CEST) Received: (qmail 71811 invoked by uid 500); 9 Jun 2016 01:39:04 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 71800 invoked by uid 99); 9 Jun 2016 01:39:03 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jun 2016 01:39:03 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id BBDE6DFFAB; Thu, 9 Jun 2016 01:39:03 +0000 (UTC) From: jcmcote To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #512: Drill 4573 fix issue with unicode chars Content-Type: text/plain Message-Id: <20160609013903.BBDE6DFFAB@git1-us-west.apache.org> Date: Thu, 9 Jun 2016 01:39:03 +0000 (UTC) archived-at: Thu, 09 Jun 2016 01:39:05 -0000 Github user jcmcote commented on a diff in the pull request: https://github.com/apache/drill/pull/512#discussion_r66370984 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/CharSequenceWrapper.java --- @@ -17,13 +17,52 @@ */ package org.apache.drill.exec.expr.fn.impl; +import java.nio.ByteBuffer; +import java.nio.CharBuffer; +import java.nio.charset.CharacterCodingException; +import java.nio.charset.Charset; +import java.nio.charset.CharsetDecoder; +import java.nio.charset.CoderResult; +import java.util.regex.Matcher; + import io.netty.buffer.DrillBuf; +/** + * A CharSequence is a readable sequence of char values. This interface provides + * uniform, read-only access to many different kinds of char sequences. A char + * value represents a character in the Basic Multilingual Plane (BMP) or a + * surrogate. Refer to Unicode Character Representation for details.
+ * Specifically this implementation of the CharSequence adapts a Drill + * {@link DrillBuf} to the CharSequence. The implementation is meant to be + * re-used that is allocated once and then passed DrillBuf to adapt. This can be + * handy to exploit API that consume CharSequence avoiding the need to create + * string objects. + * + */ public class CharSequenceWrapper implements CharSequence { + // The adapted drill buffer (in the case of US-ASCII) + private DrillBuf buffer; + // The converted bytes in the case of non ASCII + private CharBuffer charBuffer; --- End diff -- The CharSequenceWrapper does the same work as the Java String. That is convert the utf-8 bytes to a sequence of chars. However it re-uses the CharBuffer instead of constantly allocating new arrays and letting them garbage collect. This will eliminate lots of churn in the GC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---