Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 494B1111BD for ; Fri, 11 Apr 2014 06:57:09 +0000 (UTC) Received: (qmail 9040 invoked by uid 500); 11 Apr 2014 06:57:05 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 8593 invoked by uid 500); 11 Apr 2014 06:57:00 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 8573 invoked by uid 99); 11 Apr 2014 06:56:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2014 06:56:58 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of durgapalmohit@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-la0-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2014 06:56:52 +0000 Received: by mail-la0-f44.google.com with SMTP id c6so3167709lan.31 for ; Thu, 10 Apr 2014 23:56:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=ESrIZnQ47KYAIm3t4EjI3jZhE79Zleaq/8Xx7Se1MjA=; b=sdE6OXupfVb0cedmROVC5shFRLF2oMOUcpXqLU/1P01ryhQTwin2ilT6bMmgyy9NnN rOZlMdGskyIo07xPgu3XrK1SVzc/doD3NpWqadUjnGrnrzQwVOFim6fsrBC1kX4kCeVr tVCsxJX5zXKS3roAucbn58z9GDFBnRxLeK7MOgNhhnNIhHEq2p/YKLpFeLg7azEdiGtN ek8qsZhZdk6Ysjx+yDC1BYfpPHNo9RjDRf7mmJttJsGGj9A6W2dYS0gS3NZbjbKwnd51 c3nL7Q5TEL9D6ibAOWekfdGhmTSzydOlFhb5UTJoJHhMzL7WarXUAZYuRHc9BDEmNYM2 pqbQ== X-Received: by 10.152.234.130 with SMTP id ue2mr15546073lac.0.1397199391102; Thu, 10 Apr 2014 23:56:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.97.69 with HTTP; Thu, 10 Apr 2014 23:56:11 -0700 (PDT) From: Mohit Durgapal Date: Fri, 11 Apr 2014 12:26:11 +0530 Message-ID: Subject: hive query to select top 10 product of each subcategory and select most recent product info To: user@hive.apache.org Content-Type: multipart/alternative; boundary=001a113484bc631dbe04f6bed572 X-Virus-Checked: Checked by ClamAV on apache.org --001a113484bc631dbe04f6bed572 Content-Type: text/plain; charset=ISO-8859-1 I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc.... What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is that I need all the information of the product. Now when I do group by with only subcatg,pid, I can only select the same fields. But I want all the data for that product coming in the same row as subcatg & prodid like prodname, proddesc,price, mrp,imageurl. And since some information like price & proddesc of a product keep on changing I want to pick the latest column values(according to a date field) for a pid if we are able to do a group by on subcatg,pid. I am not able to find a solution to my problem in hive. Any help would be much appreciated. Regards Mohit --001a113484bc631dbe04f6bed572 Content-Type: text/html; charset=ISO-8859-1
I have a hive table partitioned by dates. It contains ecomm data in the format siteid,sitecatid,catid,subcatgid,pid,pname,pprice,pmrp,pdesc....



What I need to do is to run a query on table above in hive for top 10 products(count wise) in each sub category. What adds a bit more complexity is that I need all the information of the product. Now when I do group by with only subcatg,pid, I can only select the same fields. But I want all the data for that product coming in the same row as subcatg & prodid like prodname, proddesc,price, mrp,imageurl. And since some information like price & proddesc of a product keep on changing I want to pick the latest column values(according to a date field) for a pid if we are able to do a group by on subcatg,pid.


I am not able to find a solution to my problem in hive. Any help would be much appreciated.


Regards
Mohit
--001a113484bc631dbe04f6bed572--