Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AC7AD200BFE for ; Mon, 16 Jan 2017 18:01:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AB03D160B41; Mon, 16 Jan 2017 17:01:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id CF8B6160B28 for ; Mon, 16 Jan 2017 18:01:23 +0100 (CET) Received: (qmail 83511 invoked by uid 500); 16 Jan 2017 17:01:22 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 83501 invoked by uid 99); 16 Jan 2017 17:01:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jan 2017 17:01:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 115EA180BB7 for ; Mon, 16 Jan 2017 17:01:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.38 X-Spam-Level: ** X-Spam-Status: No, score=2.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id SsbNDmkn0_6N for ; Mon, 16 Jan 2017 17:01:20 +0000 (UTC) Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2E9C75FB5B for ; Mon, 16 Jan 2017 17:01:20 +0000 (UTC) Received: by mail-wm0-f52.google.com with SMTP id c206so184289450wme.0 for ; Mon, 16 Jan 2017 09:01:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=vvG8B7pGDxhN+/0MegzMdxY9y2WpWWCq2NwM3W6NGsw=; b=f6Hn2iSYxB8j/jFpgejmc9VT/f5YW4Lptpedk87WlyWQ2+HV8B9On8C0BQWdymKb2V BqKjjOZZhSWDdPecNuOd/nlRJ9qf7Yhsl322Pq1Lz62XnGtk5OO0YhDnCZQ6T0Il1Dew nJLys7xP3Nsqg6uVaBAIOTgL9hNy3Q1PF3Z93aoP0J+cPmGkBk1yLgvu9kAJY6GdMzhZ 59fz7w6oFh7HyL4xcGSFVzSZebJnTCTzbH8y3ELEUjjPQMxw2dHq5vUUv5SeNSjbLACY jSCJ4GOADk5eIRnmOn1T7POeoXZ6ChwAkklJkLzuq80DfPXElG2tZfDyXaB1hL5jHe+8 J9YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=vvG8B7pGDxhN+/0MegzMdxY9y2WpWWCq2NwM3W6NGsw=; b=X9iOuis388jFHTWqsoP7uuIUAxlEVPurkPAB3l5UKC0EUVtsLl8JrMX0ajD2ZNJIwL h7bY0iUmxt+HtOzuisaLSPHZ74lzWoctVDDo6LzzadNEyEdo+8KuAqQKtfJJ9oa8Vu3d cgwHB4LHOeeeXiFO2Rp3P+Uj4/Elw0jvpVBWHXb2A4K3Y/nBe9JHxVHg7KEvuNNbchWp Ly0q25/I1tuD3XElwynHdvbqh+ShFj2WvlVFz4xHTjUZtMPKbNuzyG9HmCfzk0JOE1Q9 Tx7S0d7wKEtH6sdORXVf3wCO6JFFUUAHHFs9bdO7A7JniokXnrvCnPo7x+n8nqJ51iuP 7dDA== X-Gm-Message-State: AIkVDXL+defkVaDBhd83RBxosBP6/W538/fc1/LNTo1+m8/u9L8MiJu+c4XgipYa4SF2GIXg4N/mrla9Lrj1qQ== X-Received: by 10.28.197.142 with SMTP id v136mr13617970wmf.62.1484586075481; Mon, 16 Jan 2017 09:01:15 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.70.5 with HTTP; Mon, 16 Jan 2017 09:00:54 -0800 (PST) In-Reply-To: References: From: Elliot West Date: Mon, 16 Jan 2017 17:00:54 +0000 Message-ID: Subject: Re: VARCHAR or STRING fields in Hive To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=94eb2c0d436aaaf111054639210d archived-at: Mon, 16 Jan 2017 17:01:24 -0000 --94eb2c0d436aaaf111054639210d Content-Type: text/plain; charset=UTF-8 Internally it looks as though Hive simply represents CHAR/VARCHAR values using a Java String and so I would not expect a significant change in execution performance. The Hive JIRA suggests that these types were added to 'support for more SQL-compliant behavior, such as SQL string comparison semantics, max length, etc.' rather than for performance reasons. - https://issues.apache.org/jira/browse/HIVE-4844 - https://issues.apache.org/jira/browse/HIVE-5191 In terms of storage I expect it depends on the underlying file format and the types that these values are encoded to. Parquet does appear to support the specific encoding of both CHAR/VARCHAR, however I'm skeptical that there would be significant storage efficiencies gained by using the CHAR types, over String for comparable values. I'd be keen to hear otherwise. - https://issues.apache.org/jira/browse/HIVE-7735 Thanks, Elliot. On 16 January 2017 at 15:37, Mich Talebzadeh wrote: > > Coming from DBMS background I tend to treat the columns in Hive similar to > an RDBMS table. For example if a table created in Hive as Parquet I will > use VARCHAR(30) for column that has been defined as VARCHAR(30) as source. > If a column is defined as TEXT in RDBMS table I use STRING in Hive with a > max size of 2GB I believe. > > My view is that it is more efficient storage wise to have Hive table > created as VARCHA as opposed to STRING. > > I have not really seen any performance difference if one uses VARCHAR or > STRING. However, I believe there is a reason why one has VARCH in Hive as > opposed to STRRING. > > What is the thread view on this? > > Thanks > > > Dr Mich Talebzadeh > > > > LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > * > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > --94eb2c0d436aaaf111054639210d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Internally it looks as though Hive simply represents CHAR/= VARCHAR values using a Java String and so I would not expect a significant = change in execution performance. The Hive JIRA suggests that these types we= re added to 'support= for more SQL-compliant behavior, such as SQL string comparison semantics, = max length, etc.' rather than for performance reasons.
In terms of storage I expect it depends on the underlyi= ng file format and the types that these values are encoded to.=C2=A0Parquet does appear to supp= ort the=C2=A0specific=C2=A0encoding of both CHAR/VARCHAR, however I'm s= keptical that there would be significant storage=C2=A0efficiencies=C2=A0gai= ned by using the CHAR types, over String for comparable values. I'd be = keen to hear otherwise.
Thanks,
<= span style=3D"font-size:14px">
Elliot.

On 16 January 2017 a= t 15:37, Mich Talebzadeh <mich.talebzadeh@gmail.com>= wrote:

<= div>Coming from DBMS background I tend to treat the columns in Hive similar= to an RDBMS table. For example if a table created in Hive as Parquet I wil= l use VARCHAR(30) for column that has been defined as VARCHAR(30) as source= . If a column is defined as TEXT in RDBMS table I use STRING in Hive with a= max size of 2GB I believe.

My view is that it is = more efficient storage wise to have Hive table created as VARCHA as opposed= to STRING.

I have not really seen any performanc= e difference if one uses VARCHAR or STRING. However, I believe there is a r= eason why one has VARCH in Hive as opposed to STRRING.

=
What is the thread view on this?

Thanks
=


Dr Mich Talebzadeh

=C2=A0

LinkedIn =C2=A0https://www.linkedin.com/profile/view?id= =3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<= /p>

=C2=A0

http:= //talebzadehmich.wordpress.com


Disclaimer:=C2=A0= Use it=C2=A0at your own risk. Any and all responsib= ility for any loss, damage or destruction of data or any other property which may arise from relying on this email= 9;s=C2=A0technical=C2=A0content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from = such loss, damage or destruction.

=C2=A0

<= font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

--94eb2c0d436aaaf111054639210d--