Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5E78DD3E for ; Tue, 30 Oct 2012 01:57:58 +0000 (UTC) Received: (qmail 72370 invoked by uid 500); 30 Oct 2012 01:57:58 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 72333 invoked by uid 500); 30 Oct 2012 01:57:58 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 72325 invoked by uid 99); 30 Oct 2012 01:57:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2012 01:57:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of josh.elser@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2012 01:57:52 +0000 Received: by mail-vc0-f169.google.com with SMTP id fl17so6808190vcb.0 for ; Mon, 29 Oct 2012 18:57:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=URYLAmBsLUP/2LU7XUoLdr8ze/gpigE1KJlbp4PrBuU=; b=BHdE5nGq1qR6netGgIZxJHQ7rWRY33hPSTncjfjvP+kGMMRulZNxT9a6Eq/YlCpxIM oWmoJbv2vC8KEmKl43HA7tS40VNA/VsPjrL53hqEodZn6DVKtVA95URqrkUXekvp0zOQ a04icWRseaKj3JF3suYtLH4nqfg14U3UaJwIDdMnUo032FtP9GpHxdr+hyJPFHd0lJeD MtahYYUPp9ezeDVlQpRJd2jgIYQ0CVEGtc9kS5s8oKyIkjE79Jza36+1UyDRnwOgNdkp WzqHZPdwb2DPl89YH5n9Oa6bc2HuTGDxlz10xiKkdrrf04ZZf0eNtGj963F3bnA1RH9N +KfA== Received: by 10.52.98.229 with SMTP id el5mr41098592vdb.122.1351562251465; Mon, 29 Oct 2012 18:57:31 -0700 (PDT) Received: from [192.168.2.19] (pool-173-69-170-178.bltmmd.fios.verizon.net. [173.69.170.178]) by mx.google.com with ESMTPS id d4sm5897218vew.7.2012.10.29.18.57.30 (version=SSLv3 cipher=OTHER); Mon, 29 Oct 2012 18:57:30 -0700 (PDT) Message-ID: <508F3408.5070203@gmail.com> Date: Mon, 29 Oct 2012 21:57:28 -0400 From: Josh Elser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: dev@accumulo.apache.org Subject: Re: Setting Charset in getBytes() call. References: <508EACFF.5000704@gmail.com> <508EB572.2020507@gmail.com> <508F236F.6070108@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I'm saying that I don't know of anything in the core API which performs a getBytes() on the data itself. Accumulo itself is agnostic dealing only in byte[]. I think we're saying the same thing.. On 10/29/2012 8:54 PM, Benson Margulies wrote: > On Mon, Oct 29, 2012 at 8:46 PM, Josh Elser wrote: >> +1 Mike. >> >> 1. It would be hard for me to believe Key/Value are ever handled internally >> in terms of Strings, but, if such a case does exist, it would be extremely >> prudent to fix. >> >> 2. FWIW, the Shell does use ISO-8859-1 as its charset which is referenced by >> other commands [1,2]. It would be good to double check all of the other >> commands. > > I'm a bit lost. Any possible Java String can be rendered in UTF-8. So, > if you are calling String.getBytes to turn a string into some bytes > for some purpose, I think you need UTF-8. > > On the other hand, as Mike pointed out, new String(somebytes, "utf-8") > will destroy data for some byte values that are not, in fact, UTF-8. > By why would Accumulo ever need to string-ify some array of bytes of > uncertain parentage? > > >> >> [1] >> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/Shell.java >> [2] >> https://github.com/apache/accumulo/blob/trunk/core/src/main/java/org/apache/accumulo/core/util/shell/commands/InsertCommand.java >> >> >> On 10/29/2012 8:27 PM, Michael Flester wrote: >>> >>> I agree with Benson entirely with one caveat. It seems to me that there >>> might be two categories of things being discussed >>> >>> 1. User data (keys and values) >>> 2. Ancillary things needed for operation of Accumulo (passwords). >>> >>> These could well be considered separately. Trying to do anything with >>> keys and values other than treating them as bytes all of the time >>> I find quite scary. >>> >>> And if this is only being done to satisfy pmd or findbugs, those tools >>> can be convinced to modify their reporting about this issue. >>> >>