From user-return-29273-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Oct 2 13:00:53 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 632F5DBF3 for ; Tue, 2 Oct 2012 13:00:53 +0000 (UTC) Received: (qmail 47070 invoked by uid 500); 2 Oct 2012 13:00:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 46959 invoked by uid 500); 2 Oct 2012 13:00:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 46942 invoked by uid 99); 2 Oct 2012 13:00:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 13:00:50 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of 0x6e6562@gmail.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qc0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 13:00:44 +0000 Received: by qcac10 with SMTP id c10so3884160qca.31 for ; Tue, 02 Oct 2012 06:00:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=TN+ym7r56YB2Fk4YwSQvxNTbisMdmH11gS3rMGETIsc=; b=ObXCot4GwS90DC6XOijGGO/B+XUolkLw2qAwRzldqyvlMBkKTLICjELrBAZBz/Va4d QZyBUrMcB3i52pEJhfoKEniCICetPQ1KH826geEwB7E+4qsbzWguK036G8CaB8h4iNzx b1jJGv2f+Lh5/hMKuFpYlmn03lDeywTrlRqc+hgpxi1MAAONPaqiyF0n7j7zgrhHMh6e P1pfyRRYmtY9fFfr6v1JXhiFq1g4Udj7YFM0VTL8eYpD7MnwNH8j+JVTIuJib36eUjFP 3SegLOHchxzrnwF/hBQqtNjtn1iZzOHU63WFTwwjlV94hLozhnRwdQJY3HX74mN4H1/W jNfA== MIME-Version: 1.0 Received: by 10.224.176.144 with SMTP id be16mr2034059qab.83.1349182824238; Tue, 02 Oct 2012 06:00:24 -0700 (PDT) Received: by 10.49.117.196 with HTTP; Tue, 2 Oct 2012 06:00:24 -0700 (PDT) In-Reply-To: References: Date: Tue, 2 Oct 2012 14:00:24 +0100 Message-ID: Subject: Re: 1000's of column families From: Ben Hood <0x6e6562@gmail.com> To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Dean, On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean wrote: > Ben, > to address your question, read my last post but to summarize, yes, there > is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT > when doing map/reduce. Doing map/reduce, you will now have HUGE overhead > in reading a whole slew of rows you don't care about as you can't > map/reduce a single virtual CF but must map/reduce the whole CF wasting > TONS of resources. That's a good point that I hadn't considered beforehand, especially as I'd like to run MR jobs against these CFs. Is this limitation inherent in the way that Cassandra is modelled as input for Hadoop or could you write a custom slice query to only feed in one particular prefix into Hadoop? Cheers, Ben