Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09BCC11D9E for ; Wed, 10 Sep 2014 18:49:30 +0000 (UTC) Received: (qmail 61790 invoked by uid 500); 10 Sep 2014 18:49:27 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 61719 invoked by uid 500); 10 Sep 2014 18:49:27 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 61709 invoked by uid 99); 10 Sep 2014 18:49:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 18:49:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hadoopraj@yahoo.com designates 98.138.90.253 as permitted sender) Received: from [98.138.90.253] (HELO nm4-vm0.bullet.mail.ne1.yahoo.com) (98.138.90.253) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 18:48:58 +0000 Received: from [98.138.100.112] by nm4.bullet.mail.ne1.yahoo.com with NNFMP; 10 Sep 2014 18:48:56 -0000 Received: from [98.138.226.166] by tm103.bullet.mail.ne1.yahoo.com with NNFMP; 10 Sep 2014 18:48:56 -0000 Received: from [127.0.0.1] by omp1067.mail.ne1.yahoo.com with NNFMP; 10 Sep 2014 18:48:56 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 110611.40727.bm@omp1067.mail.ne1.yahoo.com Received: (qmail 29556 invoked by uid 60001); 10 Sep 2014 18:48:56 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1410374936; bh=k4TZLMLy6G8i0uMw7fNiwndcd8swxaAgF/s3/0vBLGE=; h=Message-ID:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=B9woA/kD8aVDLiGopql5yUncH8ZjUrx5M1DGN7Jp2tgaiLnngMUA3SyPWn08v7PDLcgoXopSNUUScsHAFOvZ2TwbzwHU6E4+eyqBNk8qFzXygs9i/ie/IOaS8u6ENxNfPpiwec5MZNqQyrBT/67HgElhxtX6r/wq/ePoXxtD3Ww= X-YMail-OSG: mCJZV8cVM1muCA1e6Ozqw3UsLwuORufWhtMxG.o8KHMFV5s d3KYqSeIkqGy7aOU5jQyWtDRbAi_vlS1Mo5mtmC_vgrm1bk_0kVofv1bxXR9 yEJ_AWbc5XO0p1VBu10py8KwlLdyTBtcJyAIOzxeWNsykvMoxjWrjOipaiGl deJjxz3gVzAtCdpBagQVWp6yUZkNdR9c.yjZru6d8WYooTVCBadjofwBOIx8 nA4QWvUegPktm8mMNzh6dDisv0f2HolXQ3NK1.BVtBS.ZQkR6d_9_A17zUtp ZMsAl48Zbk_ljSLW.eq9EiehoC41.7r5azYaamjjWTLcmzIPeXmvU9RYyWs7 ZjYBkyCvd6OxTdIpZtkNd_8Zxc72wzsLVYmS9VG3atYHrD5BDwdZ5gEZjMm_ ytNHtgzSwhRb0mOOj6.amFIZcw3wkJU3PldlQXWkiRro1G23_iV9PVe11r2E wr4yYP7uBym5k4OCW6B01rt5_sVw5yjGXQjzrDEkrJMXdM9OOfTRuVFIJ6A- - Received: from [151.124.247.200] by web120703.mail.ne1.yahoo.com via HTTP; Wed, 10 Sep 2014 11:48:55 PDT X-Rocket-MIMEInfo: 002.001,VGhlDQoNClNPUlRfQVJSQVkoQ09MTEVDVF9TRVQoZGF0ZSkpWzBdIEFTIGxhdGVzdF9kYXRlDQoNCmlzIHJldHVybmluZyB0aGUgbG93ZXN0IGRhdGUuIEkgbmVlZCB0aGUgbGFyZ2VzdCBkYXRlLg0KDQoNCg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCk9uIFdlZCwgOS8xMC8xNCwgUmFqIEhhZG9vcCA8aGFkb29wcmFqQHlhaG9vLmNvbT4gd3JvdGU6DQoNCiBTdWJqZWN0OiBSZTogUmVtb3ZlIGR1cGxpY2F0ZSByZWNvcmRzIGluIEhpdmUNCiBUbzogdXNlckBoaXZlLmEBMAEBAQE- X-Mailer: YahooMailClassic/725 YahooMailWebService/0.8.203.696 Message-ID: <1410374935.9863.YahooMailBasic@web120703.mail.ne1.yahoo.com> Date: Wed, 10 Sep 2014 11:48:55 -0700 From: Raj Hadoop Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org In-Reply-To: <1410374500.34068.YahooMailBasic@web120704.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date is returning the lowest date. I need the largest date. -------------------------------------------- On Wed, 9/10/14, Raj Hadoop wrote: Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org Date: Wednesday, September 10, 2014, 2:41 PM =20 Thanks. I will try it. -------------------------------------------- On Wed, 9/10/14, Nishant Kelkar wrote: =20 Subject: Re: Remove duplicate records in Hive To: user@hive.apache.org, hadoopraj@yahoo.com Date: Wednesday, September 10, 2014, 1:59 PM =20 Hi =20 Raj,=A0 You can do something along these lines:=A0 =20 SELECT cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date FROM table GROUP BY cno, sqno; However, you have to make sure your date format is such that sorting it gives you the most recent date. The best way to do that is to have it in format: YYYY-MM-DD. Hope this helps. Best Regards,Nishant =20 Kelkar On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop wrote: =20 =20 Hi, =20 =20 =20 I have a requirement in Hive to remove duplicate records ( they differ only by one column i.e a date column) and keep the latest date record. =20 =20 =20 Sample : =20 Hive Table : =20 =A0d2 is a higher =20 cno,sqno,date =20 =20 =20 100 1 1-oct-2013 =20 101 2 1-oct-2013 =20 100 1 2-oct-2013 =20 102 2 2-oct-2013 =20 =20 =20 =20 =20 Output needed: =20 =20 =20 100 1 2-oct-2013 =20 101 2 1-oct-2013 =20 102 2 2-oct-2013 =20 =20 =20 I am using Hive 0.11 =20 =20 =20 Any suggestions please ? =20 =20 =20 Regards, =20 =20 Raj =20 =20