Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A090183D4 for ; Tue, 8 Mar 2016 12:02:09 +0000 (UTC) Received: (qmail 70364 invoked by uid 500); 8 Mar 2016 12:02:06 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 70292 invoked by uid 500); 8 Mar 2016 12:02:05 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 70282 invoked by uid 99); 8 Mar 2016 12:02:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2016 12:02:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 831C71A0306 for ; Tue, 8 Mar 2016 12:02:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FPyuXTJRwaRO for ; Tue, 8 Mar 2016 12:02:03 +0000 (UTC) Received: from mail-vk0-f53.google.com (mail-vk0-f53.google.com [209.85.213.53]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 834C760D34 for ; Tue, 8 Mar 2016 12:02:02 +0000 (UTC) Received: by mail-vk0-f53.google.com with SMTP id k1so13612044vkb.0 for ; Tue, 08 Mar 2016 04:02:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=uKMl3+LtQUa8/isBV0PNaBTta8DIwdxXiiuBdU45+oA=; b=y5aOX2/z3xkfK9UOM6Tvsqwqss5/kwdQ2SUVxvv8VU+MNYTg6jfMkRQ34RHhvtyCkl xsXFL/3TuaOc6vEdATRv9+K+B2pq66ChAOfDwP0AtJdQ4115xGsHGANx7emh3kt9cNl5 p5So3CKUcqkoUTrxFn6ijqclzwk/GXRpSqTRpm2qCMmK+Ssxf+pcORxGp5i6NfFcKH4z N5uG8v4H4pvIMs00o4ZMWp/FxdvOdNP0kgdtcpaZs5UD17TPJzsI9N29n99aXg4b/b9s uCjcrDE54RsHJBy8nMzIMVKa9jOcptVwzeGx0+oPYkT+AM0US/47jlRzjlPNBUe5ib76 e3dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=uKMl3+LtQUa8/isBV0PNaBTta8DIwdxXiiuBdU45+oA=; b=ijd1JbWOI3nZS4vZOFor9jB9G0Q4R8Y0Xz+V+t6UBTnweuM4HesAFpiaFY/NQT1h2W JMD08+SjPwLC6ww3jvd6BMPYE/Bte865jrVYMPBHqLdd+a0HUQaavfpArntvEiOaQFo+ h9htD/84fxnkOGb1/pZb4ug8aVMYpTJt4PLoWVLpTLkEktDWpEJF6KO8dom2RiPV/Tla bkj0IuA4T5WXF1WvqZ4iLgRoWn7WFcWDO71HCUWZVIfIdvl9Udueo4FRGQgLfVY3JnNG aBeyKJcLC85Juh6avJdS84tYaW/jFtCD3eFplvRMH9HLMIRcBbY+G1D7ZCUM7lJOxR7x /fFQ== X-Gm-Message-State: AD7BkJLIclHXZLvjWs62sWUBDTCevQ5FWjneQrlC6kynYNYBs8O8jADnaubCJANn3TEui2xrNec2VDpOG9OkOg== MIME-Version: 1.0 X-Received: by 10.31.9.72 with SMTP id 69mr22260931vkj.126.1457438521439; Tue, 08 Mar 2016 04:02:01 -0800 (PST) Received: by 10.31.128.213 with HTTP; Tue, 8 Mar 2016 04:02:01 -0800 (PST) In-Reply-To: <3E208F8E-4A56-4277-AAFF-E2D01588F648@chandeep.com> References: <3E208F8E-4A56-4277-AAFF-E2D01588F648@chandeep.com> Date: Tue, 8 Mar 2016 12:02:01 +0000 Message-ID: Subject: Re: Field delimiter in hive From: Mich Talebzadeh To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a11440dfa5a3d66052d885981 --001a11440dfa5a3d66052d885981 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable try "~|~" as field delimiter. It normally works for most conditions Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6= zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On 8 March 2016 at 11:56, Chandeep Singh wrote: > I=E2=80=99ve been pretty successful with two pipes (||) or two carets (^^= ) based > on my dataset even though they aren=E2=80=99t unicode. > > On Mar 7, 2016, at 8:32 PM, mahender bigdata > wrote: > > Any help on this. > > On 3/3/2016 2:38 PM, mahender bigdata wrote: > > Hi, > > I'm bit confused to know which character should be taken as delimiter for > hive table generically. Can any one suggest me best Unicode character whi= ch > doesn't come has part of data. > > Here are the couple of options, Im thinking off for Field Delimiter. > Please let me know which is best one use and chance of that character ( i= .e > delimiter ) in data is less in day to day scenario.. > > \U0001 =EE=80=81 =3D START OF HEADING =3D=3D> SOH =3D=3D> ( CTRL+SHIFT+A= in windows) =3D=3D> > Hive Default delimiter > > > *\U001F =EE=80=9F ** =3D INFORMATION SEPARATOR ONE =3D unit separator (US= ) =3D> **( > CTRL+SHIFT+ - in windows)* > > > *\U001E =EE=80=9E ** =3D INFORMATION SEPARATOR TWO =3D record separator (= RS) =3D=3D> ** ( > CTRL+SHIFT+6 in windows)* > > Some how by name i feel \U001F is best option, can any one comment or > provide best Unicode which doesn't in regular data. > > > > > > --001a11440dfa5a3d66052d885981 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
try "~|~" as field delimiter. It normally works = for most conditions


On 8 March 2016 at 11:56, Chandeep Singh <cs@= chandeep.com> wrote:
I=E2=80=99ve been pretty successful wit= h two pipes (||) or two carets (^^) based on my dataset even though they ar= en=E2=80=99t unicode.

On Mar 7, 2016, at 8:32 PM, mahender bigdata <Mahender.BigData@out= look.com> wrote:

=20 =20 =20
Any help on this.

On 3/3/2016 2:38 PM, mahender bigdata wrote:
=20 Hi,

I'm bit confused to know which character should be taken as delimiter for hive table generically. Can any one suggest me best Unicode character which doesn't come has part of data.

Here are the couple of options, Im thinking off for Field Delimiter. Please let me know which is best one use and chance of that character ( i.e delimiter ) in data is less in day to day scenario..

\U0001 =EE=80=81 =3D START OF HEADING =3D=3D> SOH=C2=A0 = =3D=3D> ( CTRL+SHIFT+A in windows)=C2=A0 =3D=3D> Hive Default delimiter

\U001F =EE=80=9F =3D INFORMATION SEPARATOR ONE =3D = unit separator (US)=C2=A0 =3D> ( CTRL+SHIFT+ - in windows)


\U001E =EE=80=9E =3D INFORMATION SEPARATOR TWO =3D = record separator (RS) =3D=3D> ( CTRL+SHIFT+6 in windows)

Some how by name i feel \U001F is best option, can any one comment or provide best Unicode which doesn't in regular data.





--001a11440dfa5a3d66052d885981--