Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A53BDEA9 for ; Wed, 19 Sep 2012 13:41:57 +0000 (UTC) Received: (qmail 28473 invoked by uid 500); 19 Sep 2012 13:41:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 28450 invoked by uid 500); 19 Sep 2012 13:41:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 28438 invoked by uid 99); 19 Sep 2012 13:41:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Sep 2012 13:41:54 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of boneill42@gmail.com designates 209.85.219.44 as permitted sender) Received: from [209.85.219.44] (HELO mail-oa0-f44.google.com) (209.85.219.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Sep 2012 13:41:50 +0000 Received: by oagn5 with SMTP id n5so513383oag.31 for ; Wed, 19 Sep 2012 06:41:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=LMSUrBdPxmsfI2dJC/FP4sgzPNH0TicdPraoNbbW9Qk=; b=GD3QYmj1kplisdD+B1YGu1Jq4vOUqcL8Czfa1lC+KjQzwaja6Tthnk2t4bScLfwlcd SMl8E6sf+gyrzxkpuCwemn2+W2O22Q+3rgWYGJ+UMw/6xGenJ/bUlOhLHvp8BsFfJSUR 2MSfcBFJT5uKcFA6YhnV9jY98c7BYcX1rQ8C9Lk+qHfXAaGhIe14Meecc+lAk8Yxbiuf nLNGSjVoo5RAYQAQZQ+AvlrvAfnrbchxvN4Zp6SSEerB1cwmLG51rZHBubZZuriVZjxu dfXGGuwwvh7fiHPsOwt/DXeYarPCVdyoo7PfwnRS5BOSencyXZPhEGEodBBb21XJ4jhz P3gg== MIME-Version: 1.0 Received: by 10.182.177.7 with SMTP id cm7mr3045893obc.17.1348062089854; Wed, 19 Sep 2012 06:41:29 -0700 (PDT) Sender: boneill42@gmail.com Received: by 10.76.0.163 with HTTP; Wed, 19 Sep 2012 06:41:29 -0700 (PDT) In-Reply-To: References: Date: Wed, 19 Sep 2012 09:41:29 -0400 X-Google-Sender-Auth: icqOMJQm9Y4qfc8a3LMFcCYb0bw Message-ID: Subject: Re: Solr Use Cases From: "Brian O'Neill" To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Roshni, We're using SOLR to support ad hoc queries and fuzzy searches against unstructured data stored in Cassandra. Cassandra is great for storage and you can create data models and indexes that support your queries, provided you can anticipate those queries. When you can't anticipate the queries, or if you need to support a large permutation of multi-dimensional queries, your probably better off using an index like SOLR. Since SOLR only supports a flat document structure, you may need to perform transformation before inserting into SOLR. We chose not to use DSE, so we used a cassandra-triggers as our mechanism to integrate SOLR. (https://github.com/hmsonline/cassandra-triggers) We intercept the mutation, transform the data into a document (w/ multi-value fields) and POST it to SOLR. More recently though, we're looking to roll out ElasticSearch. As our query demand increases, we expect SOLR to quickly become a PITA to administrer. (master->slave relationships) IMHO, ElasticSearch's architecture is a better match for Cassandra. We are also looking to substitute cassandra-triggers for Storm, allowing us to build a data processing flow using Cassandra and ElasticSearch bolts. (we've open sourced the Cassandra bolt and we'll be open sourcing the elastic search bolt shortly) -brian On Wed, Sep 19, 2012 at 8:27 AM, Roshni Rajagopal wrote: > Hi, > > Im new to Solr, and I hear that Solr is a great tool for improving search > performance > Im unsure whether Solr or DSE Search is a must for all cassandra deployments > > 1. For performance - I thought cassandra had great read & write performance. > When should solr be used ? > Taking the following use cases for cassandra from the datastax FAQ page, in > which cases would Solr be useful, and whether for all? > > Time series data management > High-velocity device data ingestion and analysis > Media streaming (e.g., music, movies) > Social media input and analysis > Online web retail (e.g., shopping carts, user transactions) > Web log management / analysis > Web click-stream analysis > Real-time data analytics > Online gaming (e.g., real-time messaging) > Write-intensive transaction systems > Buyer event analytics > Risk analysis and management > > > 2. what changes to cassandra data modeling does Solr bring? We have some > guidelines & best practices around cassandra data modeling. > Is Solr so powerful, that it does not matter how data is modelled in > cassandra? Are there different best practices for cassandra data modeling > when Solr is in the picture? > Is this something we should keep in mind while modeling for cassandra today- > that it should be good to be used via Solr in future? > > 3. Does Solr come with any drawbacks like its not real time ? > > I can & should read the manual, but it will be great if someone can explain > at a high level. > > Thank you! > > > Regards, > Roshni -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) Apache Cassandra MVP mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42