Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1DF756CB8 for ; Mon, 13 Jun 2011 23:28:16 +0000 (UTC) Received: (qmail 96059 invoked by uid 500); 13 Jun 2011 23:28:13 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95998 invoked by uid 500); 13 Jun 2011 23:28:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95990 invoked by uid 99); 13 Jun 2011 23:28:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 23:28:13 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a50.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 23:28:06 +0000 Received: from homiemail-a50.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTP id 475286F8062 for ; Mon, 13 Jun 2011 16:27:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=kjkaWcWb4PIaEfpd6/yKLM6J/nycrUF3PfJ00FFmH8U pkOmVy1neaZkYBW5jfdg7II05FEq1kxmXHeC6BysPT2h41yMAQVLgT00ZJHPRyPV cQcBPJVWvbIAHPtY2HVPO0WalD1CXv5nX0XlWC8zj4/wgNZunsiW/8mSwkEVb7VU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=uZY5PRcJ0FKdkbwLIAA9Fkr1bdg=; b=Yg5vXEEtAm q2+oWfNOE3tueYWWfPZV1F5rBNkEYJQddkt9M7ZcgyTtgDj6gg9scW77A+JJDx6B kyukX0564U0WIQW6nRmYEEVHwebn9s7oCEQ/TpgBXC1usr1moMTWgmNS0N18pL57 /GLb/zm6JPABYl9LsB3H5l9GY/6+NczO4= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a50.g.dreamhost.com (Postfix) with ESMTPSA id 7A81A6F8059 for ; Mon, 13 Jun 2011 16:27:43 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: get_indexed_slices ~ simple map-reduce From: aaron morton In-Reply-To: Date: Tue, 14 Jun 2011 11:27:40 +1200 Content-Transfer-Encoding: quoted-printable Message-Id: <3E62B9FF-0AB9-42E9-B89B-F07B3935161B@thelastpickle.com> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org =46rom a quick read of the code in o.a.c.db.ColumnFamilyStore.scan()... Candidate rows are first read by applying the most selected equality = predicate.=20 =46rom those candidate rows... 1) If the SlicePredicate has a SliceRange the query execution will read = all columns for the candidate row if the byte size of the largest = tracked row is less than column_index_size_in_kb config setting = (defaults to 64K). Meaning if no more than 1 column index page of = columns is (probably) going to be read, they will all be read.=20 2) Otherwise if the query will read the columns specified by the = SliceRange.=20 3) If the SlicePredicate uses a list of columns names, those columns and = the ones referenced in the IndexExpressions (except the one selected as = the primary pivot above) are read from disk.=20 If additional columns are needed (in case 2 above) they are read in a = separate reads from the candidate row.=20 Then when applying the SlicePredicate to produce the final projection = into the result set, all the columns required to satisfy the filter will = be in memory. =20 So, yes it reads just the columns from disk you you ask for. Unless it = thinks it will take no more work to read more.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 08:34, Michal August=FDn wrote: > Hi, >=20 > as I wrote, I don't want to install Hadoop etc. - I want just to use > the Thrift API. The core of my question is how does get_indexed_slices > function work. >=20 > I know that it must get all keys using equality expression firstly - > but what about additional expressions? Does Cassandra fetch whole > filtered rows, or just columns used in additional filtering > expression? >=20 > Thanks! >=20 > Augi >=20 > 2011/6/12 aaron morton : >> Not exactly sure what you mean here, all data access is through the = thrift >> API unless you code java and embed cassandra in your app. >> As well as Pig support there is also Hive support in brisk (which = will also >> have Pig support soon) http://www.datastax.com/products/brisk >> Can you provide some more info on the use case ? Personally if you = have a >> read query you know you need to support, I would consider supporting = it in >> the data model without secondary indexes. >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >> On 11 Jun 2011, at 19:23, Michal August=FDn wrote: >>=20 >> Hi all, >>=20 >> I'm thinking of get_indexed_slices function as a simple map-reduce = job >> (that just maps) - am I right? >>=20 >> Well, I would like to be able to run simple queries on values but I >> don't want to install Hadoop, write map-reduce jobs in Java (the = whole >> application is in C# and I don't want to introduce new development >> stack - maybe Pig would help) and have some second interface to >> Cassandra (in addition to Thrift). So secondary indexes seem to be >> rescue for me. I would have just one indexed column that will have >> day-timestamp value (~100k items per day) and the equality expression >> for this column would be in each query (and I would add more ad-hoc >> expressions). >> Will this scenario work or is there some issue I could run in? >>=20 >> Thanks! >>=20 >> Augi >>=20 >>=20