Return-Path: Delivered-To: apmail-incubator-cassandra-dev-archive@minotaur.apache.org Received: (qmail 98115 invoked from network); 1 Feb 2010 16:42:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Feb 2010 16:42:17 -0000 Received: (qmail 37348 invoked by uid 500); 1 Feb 2010 16:42:16 -0000 Delivered-To: apmail-incubator-cassandra-dev-archive@incubator.apache.org Received: (qmail 37320 invoked by uid 500); 1 Feb 2010 16:42:16 -0000 Mailing-List: contact cassandra-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-dev@incubator.apache.org Received: (qmail 37310 invoked by uid 99); 1 Feb 2010 16:42:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 16:42:16 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates 72.14.220.158 as permitted sender) Received: from [72.14.220.158] (HELO fg-out-1718.google.com) (72.14.220.158) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2010 16:42:08 +0000 Received: by fg-out-1718.google.com with SMTP id e21so116072fga.0 for ; Mon, 01 Feb 2010 08:41:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=2IR2jKqYmFeuyjBnTHMRrW0FrjPjelDuSC6+8QfUXTg=; b=hvyHbnCS69VH+DkhiDR+ojEupyeyQAhmwLZjzb0l7ddZRsWOJjjEvqs+IueYDnWMK7 9oSqnc8ORer1ODBAmh1Jd0ky4JlO5bNItDa4O15ECw2P/92ODEwFvNM74rJ29xf8yrju YPoi8gd3yGqGFmcCU7ZSDtNfg+ghYN2oy07Ls= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=S/MoguvdzYOmSef6Nx4ynUIeoKOj8CojqJHQB6V2BbNkBSPS7M6214G+K4KHp0mpBi iry5aV376XjmpONpNM6Xj4y7+TIEe92eoQA0DzhYoexfQSuMi4gFfoThbo3t3qZ98YXz 4JuWm+j1LoUBJYUh0307O1QSfdooCANT2LPok= MIME-Version: 1.0 Received: by 10.216.88.1 with SMTP id z1mr2933006wee.49.1265042508295; Mon, 01 Feb 2010 08:41:48 -0800 (PST) In-Reply-To: <878wbclxo7.fsf@lifelogs.com> References: <2d864a8c1001290416g723270efxb79ac1831c159c9@mail.gmail.com> <2d864a8c1001290709t6bf9afd1hb0c3f26eabccfacd@mail.gmail.com> <87aavwmxhm.fsf_-_@lifelogs.com> <87sk9lkn3k.fsf@lifelogs.com> <878wbclxo7.fsf@lifelogs.com> From: Jonathan Ellis Date: Mon, 1 Feb 2010 10:41:28 -0600 Message-ID: Subject: Re: bitmap slices To: cassandra-dev@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I don't think this is very useful for column names. I could see it being useful for values but if we're going to add predicate queries then I'd rather do something more general. 2010/2/1 Ted Zlatanov : > On Mon, 1 Feb 2010 09:42:16 -0600 Jonathan Ellis wrot= e: > > JE> 2010/2/1 Ted Zlatanov : >>> On Fri, 29 Jan 2010 15:07:01 -0600 Ted Zlatanov wrot= e: >>> > TZ> On Fri, 29 Jan 2010 12:06:28 -0600 Jonathan Ellis = wrote: > JE> On Fri, Jan 29, 2010 at 9:09 AM, Mehar Chaitanya > JE> wrote: >>>>>> =A0 1. This would lead to enourmous amount of duplication of data, i= n short >>>>>> =A0 if I now want to view the data from IS_PUBLISHED dimenstion then= my database >>>>>> =A0 size would scale up tremendously. >>> > JE> Yes. =A0But disk space is so cheap it's worth using a lot of it to ma= ke > JE> other things fast. >>> > TZ> IIUC, Mehar would be duplicating the article data for every article t= ag. >>> > TZ> I searched the bug tracker and wiki and didn't find anything on the > TZ> topic of tag storage and search, so I don't think Cassandra supports > TZ> tags without data duplication. >>> > TZ> Would it be possible to implement an optional byte[] bitmap field in > TZ> SliceRange? =A0If you can specify the bitmap as an optional field it = would > TZ> not break current clients. =A0Then the search can return only the sub= set > TZ> of the range that matches the bitmap. =A0This would make sense for > TZ> BytesType and LongType, at least. >>> >>> I looked at the source code and it seems that >>> StorageProxy::getSliceRange() is the focal point for reads and bitmap >>> matching should be implemented there. =A0The bitmap could be applied as= a >>> filter before the other SliceRange parameters, especially the max numbe= r >>> of return results. =A0It may be worth the effort to send the bitmap dow= n >>> to the ReadCommand/ColumnFamily level to reduce the number of potential >>> matches. >>> >>> If this is not feasible for technical reasons I'd like to know. >>> Otherwise I'll put it on my TODO list and produce a proposal (unless >>> someone more knowledgeable is interested, of course). > > JE> how would this be different then the byte[] column name you can > JE> already match on? > > Given byte columns > > A 0110 > B 0111 > C 0101 > > the bitmask approach would let you specify a bitmask of "0011" and get > only B. =A0It's just an AND that looks for a non-zero value. =A0So you ca= n > say "0111" and get A, B, and C. =A0Or "0010" to get A and B. =A0"1000" ge= ts > nothing. > > Cassandra could support OR-ed multiples for better queries, so you could > ask for (0001,0010) to get A, B, and C. > > Ted > >