hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lefty Leverenz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4590) HCatalog documentation example is wrong
Date Mon, 21 Jul 2014 01:20:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068095#comment-14068095
] 

Lefty Leverenz commented on HIVE-4590:
--------------------------------------

[~eugene.koifman], it's past time to fix this but first I have a couple of questions:

#  Why does the equivalent SELECT statement say "col1" while the description says "an integer
in the second column"?  Does this assume column numbers start with zero?
#*  "select col1, count\(*\) from $table group by col1;"
I tried to figure it out from the MR program, but strained my brain.
#  Is there a typo in the output for your sample dataset (1,1,1,3,3,3,5)?  I see three 3s,
not 2.  
#*  1, 3
3, 2,
5, 1
... and presumably the comma after the 2 (or 3) can be removed.

The doc has a new location, by the way:

* [HCat Input and Output -- Read Example | https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-ReadExample]

> HCatalog documentation example is wrong
> ---------------------------------------
>
>                 Key: HIVE-4590
>                 URL: https://issues.apache.org/jira/browse/HIVE-4590
>             Project: Hive
>          Issue Type: Bug
>          Components: Documentation, HCatalog
>    Affects Versions: 0.10.0
>            Reporter: Eugene Koifman
>            Assignee: Lefty Leverenz
>            Priority: Minor
>
> http://hive.apache.org/docs/hcat_r0.5.0/inputoutput.html#Read+Example
> reads
> The following very simple MapReduce program reads data from one table which it assumes
to have an integer in the second column, and counts how many different values it sees. That
is, it does the equivalent of "select col1, count(*) from $table group by col1;".
> The description of the query is wrong.  It actually counts how many instances of each
distinct value it find.  For example, if values of col1 are {1,1,1,3,3,3,5) it will produce
> 1, 3
> 3, 2,
> 5, 1
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message