Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05369DC1C for ; Tue, 2 Oct 2012 13:33:20 +0000 (UTC) Received: (qmail 61446 invoked by uid 500); 2 Oct 2012 13:33:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 61410 invoked by uid 500); 2 Oct 2012 13:33:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 61402 invoked by uid 99); 2 Oct 2012 13:33:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 13:33:17 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [192.174.58.133] (HELO XEDGEB.nrel.gov) (192.174.58.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 13:33:12 +0000 Received: from XHUBB.nrel.gov (10.20.4.59) by XEDGEB.nrel.gov (192.174.58.133) with Microsoft SMTP Server (TLS) id 8.3.245.1; Tue, 2 Oct 2012 07:32:48 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBB.nrel.gov ([::1]) with mapi; Tue, 2 Oct 2012 07:32:50 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Tue, 2 Oct 2012 07:33:08 -0600 Subject: Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION... Thread-Topic: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION... Thread-Index: Ac2gomsqvsrlzwW+R4GuyJSOlXsmxA== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-2" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Well, I think I know the direction we may follow so we can 1. Have Virtual CF's 2. Be able to map/reduce ONE Virtual CF Well, not map/reduce exactly but really really close. We use PlayOrm with it's partitioning so I am now thinking what we will do is have a compute grid where we can have each node doing a findAll query into the partitions it is responsible for. In this way, I think we can 1000's of virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves the rows for that partition of one virtual CF. Anyone know of a computer grid we can dish out work to? That would be my only missing piece (well, that and the PlayOrm virtual CF feature but I can add that within a week probably though I am on vacation this Thursday to monday). Later, Dean On 10/2/12 6:35 AM, "Hiller, Dean" wrote: >So basically, with moving towards the 1000's of CF all being put in one >CF, our performance is going to tank on map/reduce, correct? I mean, from >what I remember we could do map/reduce on a single CF, but by stuffing >1000's of virtual Cf's into one CF, our map/reduce will have to read in >all 999 virtual CF's rows that we don't want just to map/reduce the ONE >CF. > >Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(. > >Is this correct? This really sounds like highly undesirable behavior. >There needs to be a way for people with 1000's of CF's to also run >map/reduce on any one CF. Doing Map/reduce on 1000 times the number of >rows will be 1000 times slower=A9.and of course, we will most likely get u= p >to 20,000 tables from my most recent projections=A9.our last test load, we >ended up with 8k+ CF's. Since I kept two other keyspaces, cassandra >started getting really REALLY slow when we got up to 15k+ CF's in the >system though I didn't look into why. > >I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to >map/reduce "just" the virtual CF!!!!! Ugh. > >Thanks, >Dean > >On 10/1/12 3:38 PM, "Ben Hood" <0x6e6562@gmail.com> wrote: > >>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill >>wrote: >>> Its just a convenient way of prefixing: >>>=20 >>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa >>>c >>>es.html >> >>So given that it is possible to use a CF per tenant, should we assume >>that there at sufficient scale that there is less overhead to prefix >>keys than there is to manage multiple CFs? >> >>Ben >