Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9759A11090 for ; Mon, 19 May 2014 13:47:02 +0000 (UTC) Received: (qmail 45634 invoked by uid 500); 19 May 2014 13:47:01 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 45568 invoked by uid 500); 19 May 2014 13:47:01 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 45560 invoked by uid 99); 19 May 2014 13:47:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 13:47:01 +0000 X-ASF-Spam-Status: No, hits=2.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shushantarora09@gmail.com designates 209.85.192.48 as permitted sender) Received: from [209.85.192.48] (HELO mail-qg0-f48.google.com) (209.85.192.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 May 2014 13:46:53 +0000 Received: by mail-qg0-f48.google.com with SMTP id i50so8651145qgf.7 for ; Mon, 19 May 2014 06:46:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=UbziBRbZ1E4jESvhcRTkEKqciA9neIAP6PXqruB2ujE=; b=t5TMqWQqHqn29eYP4fqkVomxQDEEC7E8hH2YIfdM/uFaSw9EutknqBil/y43EOyu0u bJ6kQYP4n6pCBs6WpSRrGmcK/9aEt4XOhTa8QITi0IosoooMHq+3sp+eTrU97RwisYvo McgkInDPhvqnhkRMKDnCWZBoCi3ac9DTY4MWnZvVr2B9LPjTmET6ZjfL/cENmjC6EM4t CJz+kimps9IfabIomP6JlwwBMdxqjLh/DDqm8rylDcA3e6xZvvud8JWXLUsO6Do8R8el oXrbM+6tM1A/RQk8/yukLIegmcp/DWIVO6geoI3j0v4G0tYJu3rlVimI2TjBxq+Cu+RV vbqA== MIME-Version: 1.0 X-Received: by 10.224.163.8 with SMTP id y8mr7827239qax.46.1400507193124; Mon, 19 May 2014 06:46:33 -0700 (PDT) Received: by 10.140.43.36 with HTTP; Mon, 19 May 2014 06:46:33 -0700 (PDT) In-Reply-To: References: Date: Mon, 19 May 2014 19:16:33 +0530 Message-ID: Subject: Re: hbase key design to efficient query on base of 2 or more column From: Shushant Arora To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e01294f54c06c0c04f9c0fd64 X-Virus-Checked: Checked by ClamAV on apache.org --089e01294f54c06c0c04f9c0fd64 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Ok..but what if I have 2 multivalue dimensions on which I have to analyse no of users. Say Category can have 50 values and another dimension is country of user(say 100+ values). I need weekly count on category and country + I need overall distinct user count on category and country. How to achieve this in Hbase. On Mon, May 19, 2014 at 3:11 PM, Michael Segel w= rote: > The point is that choosing a field that has a small finite set of values > is not a good candidate for indexing using an inverted table or b-tree et= c =E2=80=A6 > > I=E2=80=99d say that you=E2=80=99re actually going to be better off using= a scan with a > start and stop row, then doing the counts on the client side. > > So as you get back your result set=E2=80=A6 you process the data. (Either= in a M/R > job or single client thread.) > > HTH > > On May 19, 2014, at 8:48 AM, Shushant Arora > wrote: > > > I cannot apply server side filter. > > 2nd requirement is not just get users with supreme category rather > > distribution of users category wise. > > > > 1.How many of supreme , how many of normal and how many of medium till > date. > > > > > > On Mon, May 19, 2014 at 12:58 PM, Michael Segel > > wrote: > > > >> Whoa! > >> > >> BAD BOY. This isn=E2=80=99t a good idea for secondary index. > >> > >> You have a row key (primary index) which is time. > >> The secondary is a filter=E2=80=A6 with 3 choices. > >> > >> HINT: Do you really want a secondary index based on a field that only > has > >> 3 choices for a value? > >> > >> What are they teaching in school these days? > >> > >> How about applying a server side filter? ;-) > >> > >> > >> > >> On May 18, 2014, at 12:33 PM, John Hancock > wrote: > >> > >>> Shushant, > >>> > >>> Here's one idea, there might be better ways. > >>> > >>> Take a look at phoenix it supports secondary indexing: > >>> http://phoenix.incubator.apache.org/secondary_indexing.html > >>> > >>> -John > >>> > >>> > >>> On Sat, May 17, 2014 at 8:34 AM, Shushant Arora > >>> wrote: > >>> > >>>> Hi > >>>> > >>>> I have a requirement to query my data base on date and user category= . > >>>> User category can be Supreme,Normal,Medium. > >>>> > >>>> I want to query how many new users are there in my table from date > range > >>>> (2014-01-01) to (2014-05-16) category wise. > >>>> > >>>> Another requirement is to query how many users of Supreme category a= re > >>>> there in my table Broken down wise month in which they came. > >>>> > >>>> What should be my key > >>>> 1.If i take key as combination of date#category. I cannot query base= d > on > >>>> category? > >>>> 2.If I take key as category#date I cannot query based on date. > >>>> > >>>> > >>>> Thanks > >>>> Shushant. > >>>> > >> > >> > > --089e01294f54c06c0c04f9c0fd64--