Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24D93112DB for ; Mon, 21 Jul 2014 15:14:20 +0000 (UTC) Received: (qmail 95397 invoked by uid 500); 21 Jul 2014 15:14:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95360 invoked by uid 500); 21 Jul 2014 15:14:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95347 invoked by uid 99); 21 Jul 2014 15:14:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2014 15:14:17 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tbarbugli@gmail.com designates 209.85.217.179 as permitted sender) Received: from [209.85.217.179] (HELO mail-lb0-f179.google.com) (209.85.217.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jul 2014 15:14:14 +0000 Received: by mail-lb0-f179.google.com with SMTP id v6so4817479lbi.38 for ; Mon, 21 Jul 2014 08:13:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=KrbTYPkPcdqfJoYTqie+c3tJoaKf+BUaEhkMxvMwTgk=; b=MZRh+h7H3/S1I4+tmeWFwSXdpZWiw9FYvIsSiVXIoTvD0xHVEYmYK+MK+dp9vGUB2W DCtDueOq12Lg96hYlwtxnuWAisupWqqn3B0lYNl7GFYXw6XPDq2V9Qa1x557wOqSqGec OWxBkyXtFoGJ4oACh5cn7Us2qg2Bh7mXh8CV+tW0fUYTIIwIsQTkZYiMKnrQsbxRNrRY TdTVvn5t2RjqU5E+AQ4998ULvDDj72hadZng34/BlfOPBENSO6x3GbAnjbbZW5/a3jsb dqYoKSShlvlzeeBJwazKlSVlwAQz5rCTmordCqpAEWea4lPFzZ5WxQLliTc0Mo2KqrGa 536g== MIME-Version: 1.0 X-Received: by 10.112.171.134 with SMTP id au6mr25912618lbc.21.1405955628689; Mon, 21 Jul 2014 08:13:48 -0700 (PDT) Received: by 10.112.147.68 with HTTP; Mon, 21 Jul 2014 08:13:48 -0700 (PDT) In-Reply-To: References: Date: Mon, 21 Jul 2014 17:13:48 +0200 Message-ID: Subject: Re: estimated row count for a pk range From: tommaso barbugli To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a11c376dcd18e2f04feb58d73 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c376dcd18e2f04feb58d73 Content-Type: text/plain; charset=UTF-8 thank you for the reply; I was hoping for something with a bit less overhead than the first solution; the second is not really an option for me. On Monday, 21 July 2014, DuyHai Doan wrote: > 1) Use separate counter to count number of entries in each column family > but it will require you to manage the counting manually > 2) SELECT DISTINCT partitionKey FROM .... Normally this query is > optimized and is much faster than a SELECT *. However if you have a very > big number of distinct partitions it can be slow > > > On Sun, Jul 20, 2014 at 6:48 PM, tommaso barbugli > wrote: > >> Hello, >> Lately I collapsed several (around 1k) column families in a bunch (100) >> of column families. >> To keep data separated I have added an extra column (family) which is >> part of the PK. >> >> While previous approach allowed me to always have a clear picture of >> every column family's size; now I have no other option than select all the >> rows and make some estimation to guess the overall size used by one of the >> grouped data in this CFs. >> >> eg. >> SELECT * FROM cf_shard1 WHERE family = '1'; >> >> Of course this does not work really well when cf_shard1 has some data in >> it; is there some way perhaps to get an estimated count for rows matching >> this query? >> >> Thanks, >> Tommaso >> > > -- sent from iphone (sorry for the typos) --001a11c376dcd18e2f04feb58d73 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable thank you for the reply; I was hoping for something with a bit=C2=A0less ov= erhead than the=C2=A0first solution; the second is not really = an option for me.

On Monday, 21 July 2014, DuyHai Doan <doanduyhai@gmail.com> wrote:
1) Use separate counte= r to count number of entries in each column family but it will require you = to manage the counting manually
2) SELECT DISTINCT partitionKey FROM ....=C2=A0 Normally this query i= s optimized and is much faster than a SELECT *. However if you have a very = big number of distinct partitions it can be slow


On Sun,= Jul 20, 2014 at 6:48 PM, tommaso barbugli <tbarbugli@gmail.com> wrote:
Hello,
Lately I collaps= ed several (around 1k) column families in a bunch (100) of column families.=
To keep data separated I have added an extra column (family) which is = part of the PK.

While previous approach allowed me to always have a clear pi= cture of every column family's size; now I have no other option than se= lect all the rows and make some estimation to guess the overall size used b= y one of the grouped data in this CFs.

eg.
SELECT * FROM cf_shard1 WHERE family =3D = '1';

Of course this does not work really w= ell when cf_shard1 has some data in it; is there some way perhaps to get an= estimated count for rows matching this query?

Thanks,
Tommaso



--
sent from iphone (sorry for the typos) --001a11c376dcd18e2f04feb58d73--