Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B133E62DB for ; Thu, 4 Aug 2011 22:35:50 +0000 (UTC) Received: (qmail 36232 invoked by uid 500); 4 Aug 2011 22:35:50 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 36161 invoked by uid 500); 4 Aug 2011 22:35:49 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 36152 invoked by uid 99); 4 Aug 2011 22:35:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2011 22:35:49 +0000 X-ASF-Spam-Status: No, hits=-2000.7 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Aug 2011 22:35:48 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 69CFAA91CA for ; Thu, 4 Aug 2011 22:35:27 +0000 (UTC) Date: Thu, 4 Aug 2011 22:35:27 +0000 (UTC) From: "Todd Nine (JIRA)" To: commits@cassandra.apache.org Message-ID: <828036051.9692.1312497327430.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1724459716.943.1311012357254.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Issue Comment Edited] (CASSANDRA-2915) Lucene based Secondary Indexes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079653#comment-13079653 ] Todd Nine edited comment on CASSANDRA-2915 at 8/4/11 10:33 PM: --------------------------------------------------------------- Hey guys. We're doing something similar in the hector JPA plugin. Would using dynamic composites within cassandra alleviate the need for Lucene documents? We're using this in secondary indexing and it gives us order by semantics and AND (Union). The largest issue becomes iteration with OR clauses, AND clauses can be compressed into a single column for efficient range scans, we then use iterators to UNION the OR trees together with order clauses in the composites. The caveat is that the user must define indexes with order semantics up front. However this can easily be added to the existing secondary indexing clauses. was (Author: tnine): Hey guys. We're doing something similar in the hector JPA plugin. Would using dynamic composites within cassandra alleviate the need for Lucene documents? We're using this in secondary indexing and it gives us order by semantics and AND (Union). The largest issue becomes iteration with OR clauses, AND clauses can be compressed into a single column for efficient range scans, we then use iterators to UNION the OR trees together with order clauses in the composites. > Lucene based Secondary Indexes > ------------------------------ > > Key: CASSANDRA-2915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2915 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: T Jake Luciani > Labels: secondary_index > Fix For: 1.0 > > > Secondary indexes (of type KEYS) suffer from a number of limitations in their current form: > - Multiple IndexClauses only work when there is a subset of rows under the highest clause > - One new column family is created per index this means 10 new CFs for 10 secondary indexes > This ticket will use the Lucene library to implement secondary indexes as one index per CF, and utilize the Lucene query engine to handle multiple index clauses. Also, by using the Lucene we get a highly optimized file format. > There are a few parallels we can draw between Cassandra and Lucene. > Lucene indexes segments in memory then flushes them to disk so we can sync our memtable flushes to lucene flushes. Lucene also has optimize() which correlates to our compaction process, so these can be sync'd as well. > We will also need to correlate column validators to Lucene tokenizers, so the data can be stored properly, the big win in once this is done we can perform complex queries within a column like wildcard searches. > The downside of this approach is we will need to read before write since documents in Lucene are written as complete documents. For random workloads with lot's of indexed columns this means we need to read the document from the index, update it and write it back. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira