Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73491D7D6 for ; Tue, 2 Oct 2012 19:26:39 +0000 (UTC) Received: (qmail 4573 invoked by uid 500); 2 Oct 2012 19:26:37 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 4555 invoked by uid 500); 2 Oct 2012 19:26:37 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 4544 invoked by uid 99); 2 Oct 2012 19:26:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 19:26:37 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [192.174.58.134] (HELO XEDGEA.nrel.gov) (192.174.58.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 19:26:29 +0000 Received: from XHUBA.nrel.gov (10.20.4.58) by XEDGEA.nrel.gov (192.174.58.134) with Microsoft SMTP Server (TLS) id 8.3.245.1; Tue, 2 Oct 2012 13:25:56 -0600 Received: from MAILBOX2.nrel.gov ([fe80::19a0:6c19:6421:12f]) by XHUBA.nrel.gov ([::1]) with mapi; Tue, 2 Oct 2012 13:26:07 -0600 From: "Hiller, Dean" To: "user@cassandra.apache.org" Date: Tue, 2 Oct 2012 13:26:24 -0600 Subject: Re: 1000's of column families Thread-Topic: 1000's of column families Thread-Index: Ac2g08Tjr0cTaf8ZS1OLwtRTuR6tzw== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 So you're saying that you can access the primary index with a key range, bu= t to access the secondary index, you first need to get all keys and follow = up with a multiget, which would use the secondary index to speed the lookup= of the matching rows? Yes, that is how I "believe" it works. I am by no means an expert. I also wanted to fire off a MR to process matching rows in the "virtual" CF= ideally running on the nodes where it reads data in. In 0.7, I thought th= e M/R jobs did not run locally with the data like hadoop does??? Anyone kn= ow if that is still true or does it run locally to the data now? Thanks, Dean From: Ben Hood <0x6e6562@gmail.com> Reply-To: "user@cassandra.apache.org" > Date: Tuesday, October 2, 2012 1:01 PM To: "user@cassandra.apache.org" > Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a multi get to g= et the data). It is not contiguous. The prefix in a partition they keep the data so all data for a prefix from = what I understand is contiguous. QUESTION: What I don't get in the comment is I assume you are referring to = CQL in which case we would need to specify the partition (in addition to th= e index)which means all that data is on one node, correct? Or did I miss so= mething there. Maybe my question was just silly - I wasn't referring to CQL. As for the locality of the data, I was hoping to be able to fire off an MR = job to process all matching rows in the CF - I was assuming that that this = job would get executed on the same node as the data. But I think the real confusion in my question has to do with the way the Co= lumnFamilyInputFormat has been implemented, since it would appear that it i= ngests the entire (non-OPP) CF into Hadoop, such that the predicate needs t= o be applied in the job rather than up front in the Cassandra query. Cheers, Ben