Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 30644 invoked from network); 11 Jan 2011 10:18:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Jan 2011 10:18:17 -0000 Received: (qmail 42852 invoked by uid 500); 11 Jan 2011 10:18:17 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 42614 invoked by uid 500); 11 Jan 2011 10:18:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 42588 invoked by uid 99); 11 Jan 2011 10:18:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Jan 2011 10:18:13 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Jan 2011 10:18:07 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p0BAHkxe002453 for ; Tue, 11 Jan 2011 10:17:46 GMT Message-ID: <21397931.279431294741066286.JavaMail.jira@thor> Date: Tue, 11 Jan 2011 05:17:46 -0500 (EST) From: "Stu Hood (JIRA)" To: commits@cassandra.apache.org Subject: [jira] Updated: (CASSANDRA-1472) Add bitmap secondary indexes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stu Hood updated CASSANDRA-1472: -------------------------------- Attachment: 0.7-1472-v6.tgz > I renamed KEYS_BITMAP to just BITMAP, fixed some spots that could leak files, and fixed a compaction bug related to 1916 with testcase. I incorporated your changes into the latest tarball as 0018, and fixed some silliness in 0019 and 0020. > There are some changes in here that seem to be bug fixes for other issues, specifically the changes to CFMetaData.java Dropped from this patch, and added on CASSANDRA-1962 > I see in SSTableWriter that BMT will fail on secondary indexed CFs now. Why fail though? Can't they just be built on restart? Yes, probably: but the naive approach is not very elegant, since when we see the first BMT append, we'll already have the secondary indexes open, so we need to null them out. A better approach would need to indicate to the SSTW constructor/factory that we were intending to write without certain component types... I think this can go in another ticket? > The whole BitmapIndexWriter Scratch space has me slightly concerned. There is an alternative to the layout I've implemented here, but it is slower for the most common query type (equality on one bucket), and only slightly faster for extremely general index queries (LT/GT involving most/all of the buckets). We can measure the actual overhead on a single sstable if you'd like. > AVRO, I don't see the value here. [...] The value of using our BRAF is you have all the work to avoid polluting the page cache I could go either way on this point: on one hand, this is an extremely simple structure. On the other hand, we get large benefits from compression here, and I'm fairly certain we should use Avro for the rest of the sstable. Also, it's very simple to use our FileDataInput implementations here via Avro's SeekableInput interface, so we don't necessarily need to throw away any effort. See https://github.com/stuhood/cassandra/commit/1a5c9115cb1410519eff15dd3089772b1e550ae7 > I mentioned above that on the fly indexes should be allowed, however this can happen in a subsequent ticket if you prefer. Yes, I'd prefer that. It will likely be the highest priority of the 4-5 tickets we need to create if/when this issue goes in. > As Nick mentioned it would be nice to have some stats on the index available in JMX, for a subsequent ticket. Agreed. > I think this implementation should probably be the only secondary index format we support (What's the value of keeping KEYS over this?) Agreed, pending the optimizations mentioned in previous comments. > Add bitmap secondary indexes > ---------------------------- > > Key: CASSANDRA-1472 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1472 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Stu Hood > Assignee: Stu Hood > Fix For: 0.7.1 > > Attachments: 0.7-1472-v5.tgz, 0.7-1472-v6.tgz, 0019-Rename-bugfixes-and-fileclose.txt, 1472-v3.tgz, 1472-v4.tgz, 1472-v5.tgz, anatomy.png, v4-bench-c32.txt > > > Bitmap indexes are a very efficient structure for dealing with immutable data. We can take advantage of the fact that SSTables are immutable by attaching them directly to SSTables as a new component (supported by CASSANDRA-1471). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.