Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A759F11935 for ; Wed, 10 Sep 2014 17:16:25 +0000 (UTC) Received: (qmail 98428 invoked by uid 500); 10 Sep 2014 17:16:24 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 98359 invoked by uid 500); 10 Sep 2014 17:16:24 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 98349 invoked by uid 99); 10 Sep 2014 17:16:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 17:16:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Kevin.Weiler@imc-chicago.com designates 199.168.44.1 as permitted sender) Received: from [199.168.44.1] (HELO enyo.imc-chicago.com) (199.168.44.1) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 17:16:19 +0000 Received: from chimailtest.trading.imc.intra (HELO MAILTRADING03.trading.imc.intra) ([10.198.0.40]) by enyo.trading.imc.intra with ESMTP/TLS/AES128-SHA; 10 Sep 2014 12:15:58 -0500 Received: from MAILTRADING04.trading.imc.intra ([fe80::a116:5720:1f75:68e9]) by MAILTRADING03.trading.imc.intra ([fe80::fc31:9df9:b383:154%15]) with mapi id 14.03.0174.001; Wed, 10 Sep 2014 12:15:57 -0500 From: Kevin Weiler To: "user@hive.apache.org" Subject: Re: Remove duplicate records in Hive Thread-Topic: Remove duplicate records in Hive Thread-Index: AQHPzRnU3Vi8yq1ds0CbJ26JgsKOo5v672EA Date: Wed, 10 Sep 2014 17:15:57 +0000 Message-ID: References: <1410368697.4208.YahooMailBasic@web120705.mail.ne1.yahoo.com> In-Reply-To: <1410368697.4208.YahooMailBasic@web120705.mail.ne1.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.192.104.82] Content-Type: multipart/alternative; boundary="_000_CC997958DF284B1193E31FDE264E0F28imcchicagocom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CC997958DF284B1193E31FDE264E0F28imcchicagocom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable If you can just query the table for your results, you can do a SELECT DISTI= NCT instead of just a SELECT. If you give me a bit more information about w= here the duplicate data is coming from, I can provide a bit more detail. Yo= u can come see me on the end of desk. -- Kevin Weiler IT IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL 60606= | http://imc-chicago.com/ Phone: +1 312-204-7439 | Fax: +1 312-244-3301 | E-Mail: kevin.weiler@imc-ch= icago.com On Sep 10, 2014, at 12:04 PM, Raj Hadoop > wrote: Hi, I have a requirement in Hive to remove duplicate records ( they differ only= by one column i.e a date column) and keep the latest date record. Sample : Hive Table : d2 is a higher cno,sqno,date 100 1 1-oct-2013 101 2 1-oct-2013 100 1 2-oct-2013 102 2 2-oct-2013 Output needed: 100 1 2-oct-2013 101 2 1-oct-2013 102 2 2-oct-2013 I am using Hive 0.11 Any suggestions please ? Regards, Raj ________________________________ The information in this e-mail is intended only for the person or entity to= which it is addressed. It may contain confidential and /or privileged material. If someone other t= han the intended recipient should receive this e-mail, he / she shall not b= e entitled to read, disseminate, disclose or duplicate it. If you receive this e-mail unintentionally, please inform us immediately by= "reply" and then delete it from your system. Although this information has= been compiled with great care, neither IMC Financial Markets & Asset Manag= ement nor any of its related entities shall accept any responsibility for a= ny errors, omissions or other inaccuracies in this information or for the c= onsequences thereof, nor shall it be bound in any way by the contents of th= is e-mail or its attachments. In the event of incomplete or incorrect trans= mission, please return the e-mail to the sender and permanently delete this= message and any attachments. Messages and attachments are scanned for all known viruses. Always scan att= achments before opening them. --_000_CC997958DF284B1193E31FDE264E0F28imcchicagocom_ Content-Type: text/html; charset="us-ascii" Content-ID: <404C7B567C13E44CAEAC1012FCB34957@imc.nl> Content-Transfer-Encoding: quoted-printable If you can just query the table for your results, you can do a SELECT DISTI= NCT instead of just a SELECT. If you give me a bit more information about w= here the duplicate data is coming from, I can provide a bit more detail. Yo= u can come see me on the end of desk.

--
Kevin Weiler

IT

IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, I= L 60606 | http://imc-chicago.com/

On Sep 10, 2014, at 12:04 PM, Raj Hadoop <hadoopraj@yahoo.com> wrote:


Hi,

I have a requirement in Hive to remove duplicate records ( they differ only= by one column i.e a date column) and keep the latest date record.

Sample :
Hive Table :
d2 is a higher
cno,sqno,date

100 1 1-oct-2013
101 2 1-oct-2013
100 1 2-oct-2013
102 2 2-oct-2013


Output needed:

100 1 2-oct-2013
101 2 1-oct-2013
102 2 2-oct-2013

I am using Hive 0.11

Any suggestions please ?

Regards,
Raj




The information in this e-mail is intended only for the person or entity to= which it is addressed.

It may contain confidential and /or privileged material. If someone other t= han the intended recipient should receive this e-mail, he / she shall not b= e entitled to read, disseminate, disclose or duplicate it.

If you receive this e-mail unintentionally, please inform us immediately by= "reply" and then delete it from your system. Although this infor= mation has been compiled with great care, neither IMC Financial Markets &am= p; Asset Management nor any of its related entities shall accept any responsibility for any errors, omissions or other inaccur= acies in this information or for the consequences thereof, nor shall it be = bound in any way by the contents of this e-mail or its attachments. In the = event of incomplete or incorrect transmission, please return the e-mail to the sender and permanently delet= e this message and any attachments.

Messages and attachments are scanned for all known viruses. Always scan att= achments before opening them.
--_000_CC997958DF284B1193E31FDE264E0F28imcchicagocom_--