hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Luo <j...@merkleinc.com>
Subject RE: how to get counts as a byproduct of a query
Date Thu, 03 Dec 2015 17:13:33 GMT
Ryan,
Thanks for your reply.

Your previous response gave me some hints. I think below will scan tables just once:

from table_A a join table_B b on a.X = b.X
insert INTO TABLE table_C select a.X, a.Y, b.Z
insert OVERWRITE TABLE count_A select count(a.X)
insert OVERWRITE TABLE count_B select count(b.X)
;


From: Ryan Harris [mailto:Ryan.Harris@zionsbancorp.com]
Sent: Wednesday, December 02, 2015 4:20 PM
To: user@hive.apache.org
Subject: RE: how to get counts as a byproduct of a query

Personally, I'd do it this way...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Select suba.X, suba.Y, suba.countA, subb.Z, subb.countB
FROM
(SELECT x, y, count(1) OVER (PARTITION BY X) as countA) suba
JOIN
(SELECT x, z, count(1) OVER (PARTITION BY X) as countB) subb
ON (suba.X = subb.X)

From: Frank Luo [mailto:jluo@merkleinc.com]
Sent: Wednesday, December 02, 2015 2:43 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: how to get counts as a byproduct of a query

I might not illustrate the problem well. Let’s try on a sample.

Here is what I have:
Table_A, column X and Y
Table B, column X and Z

I want to do a join on both tables on column X, like
“select a.X, a.Y, b.Z
From A a
Join B b on A.X=B.X”

In the meaning time, I want to get counts for both Table A and B. I am not able to write a
query to do both.


From: Jörn Franke [mailto:jornfranke@gmail.com]
Sent: Wednesday, December 02, 2015 3:22 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: how to get counts as a byproduct of a query

I am  not sure if I understand, but why this should not be possible using SQL in hive?

On 02 Dec 2015, at 21:26, Frank Luo <jluo@merkleinc.com<mailto:jluo@merkleinc.com>>
wrote:
Didn’t get any response, so trying one more time. I cannot believe I am the only one facing
the problem.

From: Frank Luo
Sent: Tuesday, December 01, 2015 10:40 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: how to get counts as a byproduct of a query

Very often I need to run a query against a table(s), then collect some counts. I am wondering
if there is a way to kill two birds by scanning the table once. (I don’t mind to save the
counts as a separate file or something like that)

For example, I got a table A and B. I need to do an inner join to get some result. In the
meaning time, I need to know the table counts for both A and B. Is there a smart way to get
join result as well as the counts by reading the tables once?

Thanks in advance.

This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.

This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.

________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.

This email and any attachments transmitted with it are intended for use by the intended recipient(s)
only. If you have received this email in error, please notify the sender immediately and then
delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or
distribute this email without the author’s prior permission. We take precautions to minimize
the risk of transmitting software viruses, but we advise you to perform your own virus checks
on any attachment to this message. We cannot accept liability for any loss or damage caused
by software viruses. The information contained in this communication may be confidential and
may be subject to the attorney-client privilege.
Mime
View raw message