Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 385FEDC37 for ; Sat, 6 Oct 2012 13:58:20 +0000 (UTC) Received: (qmail 51177 invoked by uid 500); 6 Oct 2012 13:58:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 51120 invoked by uid 500); 6 Oct 2012 13:58:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51112 invoked by uid 99); 6 Oct 2012 13:58:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 13:58:16 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.3 as permitted sender) Received: from [80.91.229.3] (HELO plane.gmane.org) (80.91.229.3) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 13:58:10 +0000 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TKUt7-0007Zn-5h for user@cassandra.apache.org; Sat, 06 Oct 2012 15:57:53 +0200 Received: from c-68-32-133-231.hsd1.nj.comcast.net ([68.32.133.231]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 06 Oct 2012 15:57:53 +0200 Received: from oleg.dulin by c-68-32-133-231.hsd1.nj.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 06 Oct 2012 15:57:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org From: Oleg Dulin Subject: Re: Text searches and free form queries Date: Sat, 6 Oct 2012 09:57:33 -0400 Lines: 120 Message-ID: References: <2E50C3F9-64FD-49B0-B6A5-AA5ACBE38DAD@thelastpickle.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=--------------14526896961120048547 X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-68-32-133-231.hsd1.nj.comcast.net User-Agent: Unison/2.1.9 This is a multi-part message in MIME format. ----------------14526896961120048547 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 8bit So, what I ended up doing is this -- As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of: luceneToken:record key I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast. Regards, Oleg On 2012-09-05 01:28:44 +0000, aaron morton said: > AFAIk if you want to keep it inside cassandra then�DSE, roll your own > from scratch or start with�https://github.com/tjake/Solandra�.� > > Outside of Cassandra I've heard of people using Elastic Search or Solr > which I *think* is now faster at updating the index.� > > Hope that helps.� > > � > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 4/09/2012, at 3:00 AM, Andrey V. Panov wrote: > Some one did search on Lucene, but for very fresh data they build > search index in memory so data become available for search without > delays. > > On 3 September 2012 22:25, Oleg Dulin wrote: > Dear Distinguished Colleagues: -- Regards, Oleg Dulin NYC Java Big Data Engineer http://www.olegdulin.com/ ----------------14526896961120048547 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit

So, what I ended up doing is this --


As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of:


luceneToken:record key


I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast.


Regards,

Oleg


On 2012-09-05 01:28:44 +0000, aaron morton said:


AFAIk if you want to keep it inside cassandra then�DSE, roll your own from scratch or start with�https://github.com/tjake/Solandra�.�


Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index.�


Hope that helps.�


-----------------

Aaron Morton

Freelance Developer

@aaronmorton

http://www.thelastpickle.com


On 4/09/2012, at 3:00 AM, Andrey V. Panov <panov.andy@gmail.com> wrote:

Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays.


On 3 September 2012 22:25, Oleg Dulin <oleg.dulin@gmail.com> wrote:

Dear Distinguished Colleagues:



--

Regards,

Oleg Dulin

NYC Java Big Data Engineer

http://www.olegdulin.com/

----------------14526896961120048547--