Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C72E49174 for ; Mon, 9 Apr 2012 00:17:49 +0000 (UTC) Received: (qmail 14913 invoked by uid 500); 9 Apr 2012 00:17:49 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 14867 invoked by uid 500); 9 Apr 2012 00:17:49 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 14859 invoked by uid 99); 9 Apr 2012 00:17:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 00:17:49 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zhangk1985@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Apr 2012 00:17:44 +0000 Received: by obbuo19 with SMTP id uo19so7198455obb.35 for ; Sun, 08 Apr 2012 17:17:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=JWHRGF6/qLxoZjLqbmASRh6A2vSVlvRp0zNE/9U6CgI=; b=WAu1uzfM+QSptXX5xe/Gg7+IFuYcLmlf3vIrj8l2+lH8+83LrNUQ8H93l2c/flwsgf QrRJCNQvNXCs1sIIeFNF2G0nDFbwXWDN6n5Mo1RGqrPYqGzPWxXSZo2cSvKPLp7HiwfN FhRiryenUc1aDQ39ujn1cgVAEqcA2Y7jtOPp8qu44i68rHFQUqUkauzNDQmazKJ2EYrL t8CJlho35j69Qp6UUn/5Zfl673GeJ9i8tDXp+rZ2PsXqgbdcfZ3LYxzlSL8MYLjUY4Nz 3GtYRMB+8lG4C3f9EdVU855ab0c0T1VL27ma5VisCtJyITVBCLmNF7ykg92/MRucwXOM UE1w== MIME-Version: 1.0 Received: by 10.182.36.3 with SMTP id m3mr7540025obj.8.1333930643775; Sun, 08 Apr 2012 17:17:23 -0700 (PDT) Received: by 10.182.34.164 with HTTP; Sun, 8 Apr 2012 17:17:23 -0700 (PDT) In-Reply-To: References: Date: Mon, 9 Apr 2012 08:17:23 +0800 Message-ID: Subject: Re: Make hive support various charsets From: Zhang Kai To: dev@hive.apache.org Content-Type: multipart/alternative; boundary=f46d04447f9f2d626904bd33eede X-Virus-Checked: Checked by ClamAV on apache.org --f46d04447f9f2d626904bd33eede Content-Type: text/plain; charset=ISO-8859-1 Hi I have created an issue HIVE-2917 and submitted patch through Phabricator. Is there anyone who would like to review it? Thanks, Kai Zhang 2012/3/29 Namit Jain > Kai, > > That would be great. > > Please file a jura, and submit a patch. > We would definitely like to get it for the whole community > > > Thanks, > -namit > > > On 3/28/12 8:46 PM, "Zhang Kai" wrote: > > >Hi all > > > >I've been working with hive for some time. > > > >In my company, we use hive for querying on large datasets and found it's > >very easy to use. > > > >However we also found hive is lack of various charsets support so that we > >have to manually transform data files to utf-8 encoding before loading > >them > >into hive. > > > >So I have made a patch to make hive supports setting charset when creating > >a table. > >And the charset property will be used by SerDe when it serialize or > >deserialize data. > > > >The modified hql is like: > > > >CREATE TABLE tbl1 (col1 STRING) ROW FORMAT CHARET "GBK" DELIMITED FIELDS > >TERMINATED BY '\t'; > > > >I'm very happy to contribute this to the community and looking forward to > >your feedbacks. > > > >Thanks, > >Kai Zhang > > --f46d04447f9f2d626904bd33eede--