Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 82552 invoked from network); 2 Aug 2010 06:15:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Aug 2010 06:15:23 -0000 Received: (qmail 31731 invoked by uid 500); 2 Aug 2010 06:15:23 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 31377 invoked by uid 500); 2 Aug 2010 06:15:20 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 31369 invoked by uid 99); 2 Aug 2010 06:15:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 06:15:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zshao9@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-px0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Aug 2010 06:15:14 +0000 Received: by pxi11 with SMTP id 11so2404159pxi.35 for ; Sun, 01 Aug 2010 23:14:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=ZJ8N28OWSncoL8SBhuy0Bxp47htlAxhqi8RirvMj3rU=; b=c3MCWlDPm5u3t3RrejCezDr9dq/B994Kjh3uML0fyFiGvC4sD0EwuYy4JGicDUhtFv lrO9CiR1ypUa7e6CRyJoZ9ujKIeGad2L8lDvQpYoFirLt+f8GqoOgvomdsuefgcWrQZ8 /vgPj4qYawqpXUQKcieZgQ6eJlgmxr5riSkFA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=RNDfFugdnThX2kA5dAdhx4zaEj6tGvhs00Z1UM0+3CZEy6D8ztjfux3XWCyYkA77ut pbK88ODC0DzR28ttV1N+m1dro8JTpbwGh9Bq2IIj2HSBeVI5kQ8jLl505ttsrD8ACIOf cuc9xDerC9RPShLSzm8wjwkK6IOo7lM/wW+Po= MIME-Version: 1.0 Received: by 10.114.107.10 with SMTP id f10mr6671487wac.113.1280729694480; Sun, 01 Aug 2010 23:14:54 -0700 (PDT) Received: by 10.115.91.5 with HTTP; Sun, 1 Aug 2010 23:14:54 -0700 (PDT) In-Reply-To: References: Date: Sun, 1 Aug 2010 23:14:54 -0700 Message-ID: Subject: Re: Hive support for latin1 From: Zheng Shao To: hive-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Just change FetchTask.java: public boolean fetch(ArrayList res) res.add(((Text) mSerde.serialize(io.o, io.oi)).toString()); Instead of using Text.toString(), use your own method to convert from raw bytes to unicode String. Zheng On Sun, Aug 1, 2010 at 8:31 PM, bc Wong wrote: > Hi all, > > I'm trying to figure out how to query Hive on latin1 encoded data. > > I created a file with 256 characters, with unicode value 0-255, > encoded in latin1. I made a table out of it. But when I do a "select > *", Hive returns the upper ascii rows as '\xef\xbf\xbd', which is the > replacement character '\ufffd' encoded in UTF-8. > > Does anyone know how to work with non-UTF8 data? > > Cheers, > -- > bc Wong > Cloudera Software Engineer > -- Yours, Zheng http://www.linkedin.com/in/zshao