Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
From: =?utf-8?Q?Michael_H=C3=A4usler?= <michael@akatose.de>
Content-Type: multipart/alternative; boundary="Apple-Mail=_75A71316-778F-4E10-91DD-397536C1D2A0"
Message-Id: <38E7D38A-93FC-49AA-A81C-47A1F29470B5@akatose.de>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: column statistics for non-primitive types
Date: Tue, 14 Jun 2016 22:03:11 +0200
References: <5144D9F8-53A0-4C22-A9AF-3F506F0F79B8@akatose.de> <CAJ3fcbA=A_MRLaorMZmWvGH_xrCBnF0L4WJDT3iJkpt86ph7Gg@mail.gmail.com> <4DED3455-7041-4D35-8C0F-012B7BE96D8C@akatose.de> <CAJ3fcbDEdbjF_HHVcwggOQLpyjcktNgM6BDJzdLqkywUUvr4Sw@mail.gmail.com>
To: user@hive.apache.org
In-Reply-To: <CAJ3fcbDEdbjF_HHVcwggOQLpyjcktNgM6BDJzdLqkywUUvr4Sw@mail.gmail.com>
archived-at: Tue, 14 Jun 2016 20:03:26 -0000


--Apple-Mail=_75A71316-778F-4E10-91DD-397536C1D2A0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi there,

there might be two topics here:

1) feasibility of stats for non-primitive columns
2) ease of use


1) feasibility of stats for non-primitive columns:

Hive currently collects different kind of statistics for different kind =
of types:
numeric values:	min, max, #nulls, #distincts
boolean values:	#nulls, #trues, #falses
string values:		#nulls, #distincts, avgLength, maxLength

So, it seems quite possible to also collect at least partial stats for =
top-level non-primitive columns, e.g.:
array values:		#nulls, #distincts, avgLength, maxLength=20
map values:		#nulls, #distincts, avgLength, maxLength
struct values:		#nulls, #distincts
union values:		#nulls, #distincts


2) ease of use

The presence of a single non-primitive column currently breaks the use =
of the convenience shorthand to gather statistics for all columns =
(ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS;). Imho, this slows =
down adoption of column statistics for hive users.

Best regards
Michael


> On 2016-06-14, at 12:04, Mich Talebzadeh <mich.talebzadeh@gmail.com> =
wrote:
>=20
> Hi Michael,
>=20
> Statistics for columns in Hive are kept in Hive metadata table =
tab_col_stats.
>=20
> When I am looking at this table in Oracle, I only see statistics for =
primitives columns here. STRUCT columns do not have it as a STRUCT =
column will have to be broken into its primitive columns.  I don't think =
Hive has the means to do that.
>=20
> desc tab_col_stats;
>  Name                                                                  =
   Null?    Type
>  =
------------------------------------------------------------------------ =
-------- -------------------------------------------------
>  CS_ID                                                                 =
   NOT NULL NUMBER
>  DB_NAME                                                               =
   NOT NULL VARCHAR2(128)
>  TABLE_NAME                                                            =
   NOT NULL VARCHAR2(128)
>  COLUMN_NAME                                                           =
   NOT NULL VARCHAR2(1000)
>  COLUMN_TYPE                                                           =
   NOT NULL VARCHAR2(128)
>  TBL_ID                                                                =
   NOT NULL NUMBER
>  LONG_LOW_VALUE                                                        =
            NUMBER
>  LONG_HIGH_VALUE                                                       =
            NUMBER
>  DOUBLE_LOW_VALUE                                                      =
            NUMBER
>  DOUBLE_HIGH_VALUE                                                     =
            NUMBER
>  BIG_DECIMAL_LOW_VALUE                                                 =
            VARCHAR2(4000)
>  BIG_DECIMAL_HIGH_VALUE                                                =
            VARCHAR2(4000)
>  NUM_NULLS                                                             =
   NOT NULL NUMBER
>  NUM_DISTINCTS                                                         =
            NUMBER
>  AVG_COL_LEN                                                           =
            NUMBER
>  MAX_COL_LEN                                                           =
            NUMBER
>  NUM_TRUES                                                             =
            NUMBER
>  NUM_FALSES                                                            =
            NUMBER
>  LAST_ANALYZED                                                         =
   NOT NULL NUMBER
>=20
>=20
>=20
>  So in summary although column type STRUCT do exit, I don't think Hive =
can cater for their statistics. Actually I don't think Oracle itself =
does it.
>=20
> HTH
>=20
> P.S. I am on Hive 2 and it does not.
>=20
> hive> analyze table foo compute statistics for columns;
> FAILED: UDFArgumentTypeException Only primitive type arguments are =
accepted but array<bigint> is passed.
>=20
>=20
> Dr Mich Talebzadeh
> =20
> LinkedIn  =
https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdO=
ABUrV8Pw =
<https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCd=
OABUrV8Pw>
> =20
> http://talebzadehmich.wordpress.com =
<http://talebzadehmich.wordpress.com/>
> =20
>=20
> On 14 June 2016 at 09:57, Michael H=C3=A4usler <michael@akatose.de =
<mailto:michael@akatose.de>> wrote:
> Hi there,
>=20
> you can reproduce the messages below with Hive 1.2.1.
>=20
> Best regards
> Michael
>=20
>=20
>> On 2016-06-13, at 22:21, Mich Talebzadeh <mich.talebzadeh@gmail.com =
<mailto:mich.talebzadeh@gmail.com>> wrote:
>>=20
>> which version of Hive are you using?
>>=20
>> Dr Mich Talebzadeh
>> =20
>> LinkedIn  =
https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdO=
ABUrV8Pw =
<https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP6AcPCCd=
OABUrV8Pw>
>> =20
>> http://talebzadehmich.wordpress.com =
<http://talebzadehmich.wordpress.com/>
>> =20
>>=20
>> On 13 June 2016 at 16:00, Michael H=C3=A4usler <michael@akatose.de =
<mailto:michael@akatose.de>> wrote:
>> Hi there,
>>=20
>>=20
>> when testing column statistics I stumbled upon the following error =
message:
>>=20
>> DROP TABLE IF EXISTS foo;
>> CREATE TABLE foo (foo BIGINT, bar ARRAY<BIGINT>, foobar =
STRUCT<key:STRING,value:STRING>);
>>=20
>> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS;
>> FAILED: UDFArgumentTypeException Only primitive type arguments are =
accepted but array<bigint> is passed.
>>=20
>> ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS foobar, bar;
>> FAILED: UDFArgumentTypeException Only primitive type arguments are =
accepted but struct<key:string,value:string> is passed.
>>=20
>>=20
>> 1) Basically, it seems that column statistics don't work for =
non-primitive types. Are there any workarounds or any plans to change =
this?
>>=20
>> 2) Furthermore, the convenience syntax to compute statistics for all =
columns does not work as soon as there is a non-supported column. Are =
there any plans to change this, so it is easier to compute statistics =
for all supported columns?
>>=20
>> 3) ANALYZE TABLE will only provide the first failing *type* in the =
error message. Especially for wide tables it would be much easier if all =
non-supported column *names* would be printed.
>>=20
>>=20
>> Best regards
>> Michael
>>=20
>>=20
>=20
>=20


--Apple-Mail=_75A71316-778F-4E10-91DD-397536C1D2A0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">Hi there,<div class=3D""><br class=3D""></div><div =
class=3D"">there might be two topics here:</div><div class=3D""><br =
class=3D""></div><div class=3D"">1) feasibility of stats for =
non-primitive columns</div><div class=3D"">2) ease of use</div><div =
class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div =
class=3D"">1) feasibility of stats for non-primitive columns:</div><div =
class=3D""><br class=3D""></div><div class=3D"">Hive currently collects =
different kind of statistics for different kind of types:</div><div =
class=3D"">numeric values:<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">	</span>min, max, #nulls, =
#distincts</div><div class=3D"">boolean values:<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">	</span>#nulls, =
#trues, #falses</div><div class=3D"">string values:<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">		=
</span>#nulls, #distincts, avgLength, maxLength</div><div class=3D""><br =
class=3D""></div><div class=3D"">So, it seems quite possible to also =
collect at least partial stats for top-level non-primitive columns, =
e.g.:</div><div class=3D"">array values:<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">		</span>#nulls, #distincts, =
avgLength, maxLength&nbsp;</div><div class=3D"">map values:<span =
class=3D"Apple-tab-span" style=3D"white-space: pre;">		=
</span>#nulls, #distincts, avgLength, maxLength</div><div =
class=3D"">struct values:<span class=3D"Apple-tab-span" =
style=3D"white-space:pre">		</span>#nulls, =
#distincts</div><div class=3D"">union values:<span =
class=3D"Apple-tab-span" style=3D"white-space:pre">		=
</span>#nulls, #distincts</div><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D"">2) ease of use</div><div =
class=3D""><br class=3D""></div><div class=3D"">The presence of a single =
non-primitive column currently breaks the use of the convenience =
shorthand to gather statistics for all columns (ANALYZE TABLE foo =
COMPUTE STATISTICS FOR COLUMNS;). Imho, this slows down adoption of =
column statistics for hive users.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Best regards</div><div =
class=3D"">Michael</div><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div =
class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D"">On =
2016-06-14, at 12:04, Mich Talebzadeh &lt;<a =
href=3D"mailto:mich.talebzadeh@gmail.com" =
class=3D"">mich.talebzadeh@gmail.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div dir=3D"ltr" =
class=3D""><div class=3D"">Hi Michael,</div><div class=3D""><br =
class=3D""></div><div class=3D"">Statistics for columns in Hive are kept =
in Hive metadata table tab_col_stats.</div><div class=3D""><br =
class=3D""></div><div class=3D"">When I am looking at this table in =
Oracle, I only see statistics for primitives columns here.&nbsp;STRUCT =
columns do not have it as a STRUCT column will have to be broken into =
its primitive columns.&nbsp; I don't think Hive has the means to do =
that.</div><div class=3D""><br class=3D""></div><div class=3D""><font =
color=3D"#0000ff" face=3D"monospace,monospace" class=3D"">desc =
tab_col_stats;<br =
class=3D"">&nbsp;Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
Null?&nbsp;&nbsp;&nbsp; Type<br =
class=3D"">&nbsp;---------------------------------------------------------=
--------------- -------- =
-------------------------------------------------<br =
class=3D"">&nbsp;CS_ID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL =
NUMBER<br =
class=3D"">&nbsp;DB_NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL VARCHAR2(128)<br =
class=3D"">&nbsp;TABLE_NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL VARCHAR2(128)<br =
class=3D"">&nbsp;COLUMN_NAME&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL VARCHAR2(1000)<br =
class=3D"">&nbsp;COLUMN_TYPE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL VARCHAR2(128)<br =
class=3D"">&nbsp;TBL_ID&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL NUMBER<br =
class=3D"">&nbsp;LONG_LOW_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
NUMBER<br =
class=3D"">&nbsp;LONG_HIGH_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NUMBER<br =
class=3D"">&nbsp;DOUBLE_LOW_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NUMBER<br =
class=3D"">&nbsp;DOUBLE_HIGH_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NUMBER<br =
class=3D"">&nbsp;BIG_DECIMAL_LOW_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp; VARCHAR2(4000)<br =
class=3D"">&nbsp;BIG_DECIMAL_HIGH_VALUE&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp; VARCHAR2(4000)<br =
class=3D"">&nbsp;NUM_NULLS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOT NULL NUMBER<br =
class=3D"">&nbsp;NUM_DISTINCTS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
NUMBER<br =
class=3D"">&nbsp;AVG_COL_LEN&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp; NUMBER<br =
class=3D"">&nbsp;MAX_COL_LEN&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp; NUMBER<br =
class=3D"">&nbsp;NUM_TRUES&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; NUMBER<br =
class=3D"">&nbsp;NUM_FALSES&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp; NUMBER<br =
class=3D"">&nbsp;LAST_ANALYZED&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; NOT NULL NUMBER</font><br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D""><br class=3D""></div><div =
class=3D""><br class=3D""></div><div class=3D"">&nbsp;So in summary =
although column type STRUCT do exit, I don't think Hive can cater for =
their statistics. Actually I don't think Oracle itself does =
it.</div><div class=3D""><br class=3D""></div><div =
class=3D"">HTH</div><div class=3D""><br class=3D""></div><div =
class=3D"">P.S. I am on Hive 2 and it does not.</div><div class=3D""><br =
class=3D""></div><div class=3D""><font color=3D"#0000ff" =
face=3D"monospace,monospace" class=3D"">hive&gt; analyze table foo =
compute statistics for columns;<br class=3D"">FAILED: =
UDFArgumentTypeException Only primitive type arguments are accepted but =
array&lt;bigint&gt; is passed.<br class=3D""></font></div><div =
class=3D""><br class=3D""></div></div><div class=3D"gmail_extra"><br =
clear=3D"all" class=3D""><div class=3D""><div class=3D"gmail_signature" =
data-smartmail=3D"gmail_signature"><div dir=3D"ltr" class=3D""><font =
face=3D"Times New Roman" size=3D"3" class=3D"">

</font><div style=3D"margin: 0cm 0cm 0pt;" class=3D""><font =
face=3D"Calibri" size=3D"3" class=3D"">Dr Mich =
Talebzadeh</font></div><font face=3D"Times New Roman" size=3D"3" =
class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><font face=3D"Calibri" =
size=3D"3" class=3D"">&nbsp;</font></p><font face=3D"Times New Roman" =
size=3D"3" class=3D"">

</font><div style=3D"margin: 0cm 0cm 0pt;" class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif" class=3D""><font =
size=3D"3" class=3D"">LinkedIn </font></span><i class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10pt" =
class=3D""><font class=3D"">&nbsp;</font><a =
href=3D"https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP=
6AcPCCdOABUrV8Pw" target=3D"_blank" class=3D""><font color=3D"#0000ff" =
class=3D"">https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd=
6zP6AcPCCdOABUrV8Pw</font></a></span></i></div><font face=3D"Times New =
Roman" size=3D"3" class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><font face=3D"Calibri" =
size=3D"3" class=3D"">&nbsp;</font></p><font face=3D"Times New Roman" =
size=3D"3" class=3D"">

</font><div style=3D"margin: 0cm 0cm 0pt; text-align: justify;" =
class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10pt" =
class=3D""><a href=3D"http://talebzadehmich.wordpress.com/" =
target=3D"_blank" class=3D""><font color=3D"#0000ff" =
class=3D"">http://talebzadehmich.wordpress.com</font></a></span></div><fon=
t face=3D"Times New Roman" size=3D"3" class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:9pt" =
class=3D""><font class=3D"">&nbsp;</font></span></p><font face=3D"Times =
New Roman" size=3D"3" class=3D"">

</font></div></div></div>
<br class=3D""><div class=3D"gmail_quote">On 14 June 2016 at 09:57, =
Michael H=C3=A4usler <span dir=3D"ltr" class=3D"">&lt;<a =
href=3D"mailto:michael@akatose.de" target=3D"_blank" =
class=3D"">michael@akatose.de</a>&gt;</span> wrote:<br =
class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
style=3D"-ms-word-wrap: break-word;" class=3D"">Hi there,<div =
class=3D""><br class=3D""></div><div class=3D"">you can reproduce the =
messages below with Hive 1.2.1.</div><div class=3D""><br =
class=3D""></div><div class=3D"">Best regards</div><span =
class=3D"HOEnZb"><font color=3D"#888888" class=3D""><div =
class=3D"">Michael</div></font></span><div class=3D""><div =
class=3D"h5"><div class=3D""><br class=3D""></div><div class=3D""><br =
class=3D""><div class=3D""><blockquote type=3D"cite" class=3D""><div =
class=3D"">On 2016-06-13, at 22:21, Mich Talebzadeh &lt;<a =
href=3D"mailto:mich.talebzadeh@gmail.com" target=3D"_blank" =
class=3D"">mich.talebzadeh@gmail.com</a>&gt; wrote:</div><br =
class=3D""><div class=3D""><div dir=3D"ltr" class=3D"">which version of =
Hive are you using?</div><div class=3D"gmail_extra"><br clear=3D"all" =
class=3D""><div class=3D""><div data-smartmail=3D"gmail_signature" =
class=3D""><div dir=3D"ltr" class=3D""><font face=3D"Times New Roman" =
size=3D"3" class=3D"">

</font><div style=3D"margin:0cm 0cm 0pt" class=3D""><font face=3D"Calibri"=
 size=3D"3" class=3D"">Dr Mich Talebzadeh</font></div><font face=3D"Times =
New Roman" size=3D"3" class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><font face=3D"Calibri" =
size=3D"3" class=3D"">&nbsp;</font></p><font face=3D"Times New Roman" =
size=3D"3" class=3D"">

</font><div style=3D"margin:0cm 0cm 0pt" class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif" class=3D""><font =
size=3D"3" class=3D"">LinkedIn </font></span><i class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10pt" =
class=3D""><font class=3D"">&nbsp;</font><a =
href=3D"https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd6zP=
6AcPCCdOABUrV8Pw" target=3D"_blank" class=3D""><font color=3D"#0000ff" =
class=3D"">https://www.linkedin.com/profile/view?id=3DAAEAAAAWh2gBxianrbJd=
6zP6AcPCCdOABUrV8Pw</font></a></span></i></div><font face=3D"Times New =
Roman" size=3D"3" class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><font face=3D"Calibri" =
size=3D"3" class=3D"">&nbsp;</font></p><font face=3D"Times New Roman" =
size=3D"3" class=3D"">

</font><div style=3D"margin:0cm 0cm 0pt;text-align:justify" =
class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:10pt" =
class=3D""><a href=3D"http://talebzadehmich.wordpress.com/" =
target=3D"_blank" class=3D""><font color=3D"#0000ff" =
class=3D"">http://talebzadehmich.wordpress.com</font></a></span></div><fon=
t face=3D"Times New Roman" size=3D"3" class=3D"">

</font><p style=3D"margin:0cm 0cm 0pt" class=3D""><span =
style=3D"font-family:&quot;Arial&quot;,sans-serif;font-size:9pt" =
class=3D""><font class=3D"">&nbsp;</font></span></p><font face=3D"Times =
New Roman" size=3D"3" class=3D"">

</font></div></div></div>
<br class=3D""><div class=3D"gmail_quote">On 13 June 2016 at 16:00, =
Michael H=C3=A4usler <span dir=3D"ltr" class=3D"">&lt;<a =
href=3D"mailto:michael@akatose.de" target=3D"_blank" =
class=3D"">michael@akatose.de</a>&gt;</span> wrote:<br =
class=3D""><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-widt=
h:1px;border-left-style:solid">Hi there,<br class=3D"">
<br class=3D"">
<br class=3D"">
when testing column statistics I stumbled upon the following error =
message:<br class=3D"">
<br class=3D"">
DROP TABLE IF EXISTS foo;<br class=3D"">
CREATE TABLE foo (foo BIGINT, bar ARRAY&lt;BIGINT&gt;, foobar =
STRUCT&lt;key:STRING,value:STRING&gt;);<br class=3D"">
<br class=3D"">
ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS;<br class=3D"">
FAILED: UDFArgumentTypeException Only primitive type arguments are =
accepted but array&lt;bigint&gt; is passed.<br class=3D"">
<br class=3D"">
ANALYZE TABLE foo COMPUTE STATISTICS FOR COLUMNS foobar, bar;<br =
class=3D"">
FAILED: UDFArgumentTypeException Only primitive type arguments are =
accepted but struct&lt;key:string,value:string&gt; is passed.<br =
class=3D"">
<br class=3D"">
<br class=3D"">
1) Basically, it seems that column statistics don't work for =
non-primitive types. Are there any workarounds or any plans to change =
this?<br class=3D"">
<br class=3D"">
2) Furthermore, the convenience syntax to compute statistics for all =
columns does not work as soon as there is a non-supported column. Are =
there any plans to change this, so it is easier to compute statistics =
for all supported columns?<br class=3D"">
<br class=3D"">
3) ANALYZE TABLE will only provide the first failing *type* in the error =
message. Especially for wide tables it would be much easier if all =
non-supported column *names* would be printed.<br class=3D"">
<br class=3D"">
<br class=3D"">
Best regards<br class=3D"">
<span class=3D""><font color=3D"#888888" class=3D"">Michael<br class=3D"">=

<br class=3D"">
</font></span></blockquote></div><br class=3D""></div>
</div></blockquote></div><br =
class=3D""></div></div></div></div></blockquote></div><br =
class=3D""></div>
</div></blockquote></div><br class=3D""></div></body></html>=

--Apple-Mail=_75A71316-778F-4E10-91DD-397536C1D2A0--