Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 29FBC18A81 for ; Mon, 7 Mar 2016 20:33:14 +0000 (UTC) Received: (qmail 58815 invoked by uid 500); 7 Mar 2016 20:33:12 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 58738 invoked by uid 500); 7 Mar 2016 20:33:12 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 58726 invoked by uid 99); 7 Mar 2016 20:33:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Mar 2016 20:33:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6644AC0AB3 for ; Mon, 7 Mar 2016 20:33:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.279 X-Spam-Level: **** X-Spam-Status: No, score=4.279 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RDNS_NONE=3, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id C4icNjjtNtRH for ; Mon, 7 Mar 2016 20:33:10 +0000 (UTC) Received: from BLU004-OMC3S27.hotmail.com (unknown [65.55.116.102]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 5FC5E5FADC for ; Mon, 7 Mar 2016 20:33:09 +0000 (UTC) Received: from BLU436-SMTP197 ([65.55.116.72]) by BLU004-OMC3S27.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Mon, 7 Mar 2016 12:33:02 -0800 X-TMN: [1NEOMb7i28vcQd5OTveUB3MCudTsZjhK] X-Originating-Email: [mahender.bigdata@outlook.com] Message-ID: Subject: Re: Field delimiter in hive References: To: "user@hive.apache.org" From: mahender bigdata Date: Mon, 7 Mar 2016 12:32:57 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------060006020502030101080907" X-OriginalArrivalTime: 07 Mar 2016 20:33:01.0423 (UTC) FILETIME=[8B2107F0:01D178B0] --------------060006020502030101080907 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Any help on this. On 3/3/2016 2:38 PM=2C mahender bigdata wrote: > Hi=2C > > I'm bit confused to know which character should be taken as delimiter=20 > for hive table generically. Can any one suggest me best Unicode=20 > character which doesn't come has part of data. > > Here are the couple of options=2C Im thinking off for Field Delimiter.=20 > Please let me know which is best one use and chance of that character=20 > ( i.e delimiter ) in data is less in day to day scenario.. > > \U0001 =EE=80=81 =3D START OF HEADING =3D=3D> SOH =3D=3D> ( CTRL+SHIFT+A= in windows) =20 > =3D=3D> Hive Default delimiter > > > _\U001F =EE=80=9F __=3D INFORMATION SEPARATOR ONE =3D unit separator (US)= =3D> __(=20 > CTRL+SHIFT+ - in windows)_ > > > _\U001E =EE=80=9E __=3D INFORMATION SEPARATOR TWO =3D record separator (R= S) =3D=3D>=20 > __( CTRL+SHIFT+6 in windows)_ > > Some how by name i feel \U001F is best option=2C can any one comment or=20 > provide best Unicode which doesn't in regular data. > > > --------------060006020502030101080907 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Any help on this.

On 3/3/2016 2:38 PM=2C mahender bigdata wrote:
Hi=2C

I'm bit confused to know which character should be taken as delimiter for hive table generically. Can any one suggest me best Unicode character which doesn't come has part of data.

Here are the couple of options=2C Im thinking off for Field Delimiter. Please let me know which is best one use and chance of that character ( i.e delimiter ) in data is less in day to day scenario..

\U0001 =EE=80=81 =3D START OF HEADING =3D=3D>=3B SOH=C2= =A0 =3D=3D>=3B ( CTRL+SHIFT+A in windows)=C2=A0 =3D=3D>=3B Hive Default delimiter<= br>

\U001F =EE=80=9F =3D INFORMATION SEPARATOR ONE = =3D unit separator (US)=C2=A0 =3D>=3B ( CTRL+S= HIFT+ - in windows)


\U001E =EE=80=9E =3D INFORMATION SEPARATOR TWO = =3D record separator (RS) =3D=3D>=3B ( CTRL+SHIFT+6 in windows)

Some how by name i feel \U001F is best option=2C can any one comment or provide best Unicode which doesn't in regular data.




--------------060006020502030101080907--