Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3350CF8A1 for ; Tue, 2 Apr 2013 19:51:49 +0000 (UTC) Received: (qmail 38254 invoked by uid 500); 2 Apr 2013 19:51:47 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 38112 invoked by uid 500); 2 Apr 2013 19:51:47 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 38104 invoked by uid 99); 2 Apr 2013 19:51:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 19:51:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kwiley@keithwiley.com designates 69.56.148.7 as permitted sender) Received: from [69.56.148.7] (HELO gateway06.websitewelcome.com) (69.56.148.7) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Apr 2013 19:51:42 +0000 Received: by gateway06.websitewelcome.com (Postfix, from userid 5007) id 5FC9FDC8727D1; Tue, 2 Apr 2013 14:51:22 -0500 (CDT) Received: from gator542.hostgator.com (gator542.hostgator.com [74.54.187.114]) by gateway06.websitewelcome.com (Postfix) with ESMTP id 512F3DC872786 for ; Tue, 2 Apr 2013 14:51:22 -0500 (CDT) Received: from [24.19.6.8] (port=49030 helo=[192.168.10.2]) by gator542.hostgator.com with esmtpa (Exim 4.80) (envelope-from ) id 1UN7En-0005R9-OI for user@hive.apache.org; Tue, 02 Apr 2013 14:51:22 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1085) Subject: Re: Need rank() From: Keith Wiley In-Reply-To: Date: Tue, 2 Apr 2013 12:51:20 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <396B47A9-57A2-4E78-89BF-8493C0D8A2C9@keithwiley.com> References: <10A7E036-778F-410E-84B1-4759B5E21D7D@keithwiley.com> <844940D9-C255-41C7-BC72-86BCF8A48F9E@keithwiley.com> To: user@hive.apache.org X-Mailer: Apple Mail (2.1085) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator542.hostgator.com X-AntiAbuse: Original Domain - hive.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - keithwiley.com X-BWhitelist: no X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([192.168.10.2]) [24.19.6.8]:49030 X-Source-Auth: kwiley+keithwiley.com X-Email-Count: 3 X-Source-Cap: a2J3aWxleTtrYndpbGV5O2dhdG9yNTQyLmhvc3RnYXRvci5jb20= X-Virus-Checked: Checked by ClamAV on apache.org Yep, the original article is definitely erroneous in this regard. I = figured out that eventually. I'm not sure how much I can trust that = resource now. I may have to look elsewhere. I agree that Edward's = description is pretty good, but as I said earlier, I can't actually use = his code, so I'm trying to cobble a workable solution together from the = various resources available. Ritesh's article, despite the error in the = Hive syntax, is still useful in that it enables one to quickly compile a = simple rank jar without relying on git, maven, or other project = dependencies -- problems which have plagued me with Edward's approach. = So, if I can use Ritesh's method to write a simple rank function, and = Edward's accurate description of how to construct the query, then I can = put all the pieces together into a workable solution. I'll let you know if I get it. On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote: > You are getting the error because you are ORDERing BY rank but rank is = not in the top SELECT >=20 > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a = subquery: > SELECT ..., rank(user) > FROM (SELECT ... DISTRIBUTE BY ... SORT BY) >=20 > igor > decide.com >=20 >=20 > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley = wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: >=20 > > = http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-gro= up-in-hadoophive/ >=20 > Is there any possibility there is a bug in Ritesh Agrawal's query = statement from that article? I created a test table with the exact = column names from the example in the article and used a minimally = altered version of the command (I removed the where clause to simplify = things a bit) and got an error which suggests there is something = slightly wrong with the command (or perhaps the table has to be = configured a special way). Here's what I get when I almost perfectly = duplicate that example: >=20 > hive> describe test; > OK > user string > category string > value int > Time taken: 0.082 seconds > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > hive> select * from test; > OK > user1 cat1 1 > user1 cat1 2 > user1 cat1 3 > user1 cat2 10 > user1 cat2 20 > user1 cat2 30 > user2 cat1 11 > user2 cat1 21 > user2 cat1 31 > user2 cat2 5 > user2 cat2 6 > user2 cat2 7 > Time taken: 0.202 seconds > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or = column reference 'rank': (possible column names are: _col0, _col1, = _col2) > hive> >=20 > = __________________________________________________________________________= ______ > Keith Wiley kwiley@keithwiley.com keithwiley.com = music.keithwiley.com >=20 > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge = ratio than > when I entered." > -- Keith Wiley > = __________________________________________________________________________= ______ >=20 >=20 = __________________________________________________________________________= ______ Keith Wiley kwiley@keithwiley.com keithwiley.com = music.keithwiley.com "Luminous beings are we, not this crude matter." -- Yoda = __________________________________________________________________________= ______