hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Weiler <>
Subject Re: Remove duplicate records in Hive
Date Wed, 10 Sep 2014 17:15:57 GMT
If you can just query the table for your results, you can do a SELECT DISTINCT instead of just
a SELECT. If you give me a bit more information about where the duplicate data is coming from,
I can provide a bit more detail. You can come see me on the end of desk.

Kevin Weiler
IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606 |
Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail:<>

On Sep 10, 2014, at 12:04 PM, Raj Hadoop <<>>


I have a requirement in Hive to remove duplicate records ( they differ only by one column
i.e a date column) and keep the latest date record.

Sample :
Hive Table :
d2 is a higher

100 1 1-oct-2013
101 2 1-oct-2013
100 1 2-oct-2013
102 2 2-oct-2013

Output needed:

100 1 2-oct-2013
101 2 1-oct-2013
102 2 2-oct-2013

I am using Hive 0.11

Any suggestions please ?



The information in this e-mail is intended only for the person or entity to which it is addressed.

It may contain confidential and /or privileged material. If someone other than the intended
recipient should receive this e-mail, he / she shall not be entitled to read, disseminate,
disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by "reply" and then
delete it from your system. Although this information has been compiled with great care, neither
IMC Financial Markets & Asset Management nor any of its related entities shall accept
any responsibility for any errors, omissions or other inaccuracies in this information or
for the consequences thereof, nor shall it be bound in any way by the contents of this e-mail
or its attachments. In the event of incomplete or incorrect transmission, please return the
e-mail to the sender and permanently delete this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan attachments before
opening them.

View raw message