Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of
 gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.3 as permitted
 sender)
To: user@cassandra.apache.org
From: Oleg Dulin <oleg.dulin@gmail.com>
Subject: Re: Text searches and free form queries
Date: Sat, 6 Oct 2012 09:57:33 -0400
Lines: 120
Message-ID: <k4pdcc$88p$1@ger.gmane.org>
References: <k22b41$evb$1@ger.gmane.org>
 <CAJciDs1Qd82wUyC+AtqprkedizUV0YVpd2HVQ5oVuj4u5bi8Og@mail.gmail.com>
 <2E50C3F9-64FD-49B0-B6A5-AA5ACBE38DAD@thelastpickle.com>
Mime-Version: 1.0
Content-Type: multipart/alternative;
 boundary=--------------14526896961120048547
User-Agent: Unison/2.1.9

This is a multi-part message in MIME format.

----------------14526896961120048547
Content-Type: text/plain; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

So, what I ended up doing is this --

As I write my records into the main CF, I tokenize some fields that I 
want to search on using Lucene and write an index into a separate CF, 
such that my columns are a composite of:

luceneToken:record key

I can then search my records by doing a slice for each lucene token in 
the search query and then do an intersection of the sets. It works 
pretty fast.

Regards,
Oleg

On 2012-09-05 01:28:44 +0000, aaron morton said:

> AFAIk if you want to keep it inside cassandra then�DSE, roll your own 
> from scratch or start with�https://github.com/tjake/Solandra�.�
> 
> Outside of Cassandra I've heard of people using Elastic Search or Solr 
> which I *think* is now faster at updating the index.�
> 
> Hope that helps.�
> 
> �
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/09/2012, at 3:00 AM, Andrey V. Panov <panov.andy@gmail.com> wrote:
> Some one did search on Lucene, but for very fresh data they build 
> search index in memory so data become available for search without 
> delays.
> 
> On 3 September 2012 22:25, Oleg Dulin <oleg.dulin@gmail.com> wrote:
> Dear Distinguished Colleagues:


-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/
----------------14526896961120048547
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Style-Type" content="text/css">
<title></title>
<meta name="Generator" content="Cocoa HTML Writer">
<meta name="CocoaVersion" content="1187.34">
<style type="text/css">
p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 12.0px Helvetica}
p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 15.0px; font: 12.0px Helvetica; min-height: 14.0px}
p.p3 {margin: 0.0px 0.0px 0.0px 12.0px; font: 16.0px Helvetica; color: #011892}
p.p4 {margin: 0.0px 0.0px 0.0px 12.0px; font: 16.0px Helvetica; color: #011892; min-height: 19.0px}
p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; line-height: 14.0px; font: 12.0px Helvetica; min-height: 14.0px}
p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px}
p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #929292}
span.s1 {text-decoration: underline}
span.s2 {direction: ltr; unicode-bidi: embed}
span.s3 {text-decoration: underline ; direction: ltr; unicode-bidi: embed}
</style>
</head>
<body>
<p class="p1">So, what I ended up doing is this --</p>
<p class="p2"><br></p>
<p class="p1">As I write my records into the main CF, I tokenize some fields that I want to search on using Lucene and write an index into a separate CF, such that my columns are a composite of:</p>
<p class="p2"><br></p>
<p class="p1">luceneToken:record key</p>
<p class="p2"><br></p>
<p class="p1">I can then search my records by doing a slice for each lucene token in the search query and then do an intersection of the sets. It works pretty fast.</p>
<p class="p2"><br></p>
<p class="p1">Regards,</p>
<p class="p1">Oleg</p>
<p class="p2"><br></p>
<p class="p1">On 2012-09-05 01:28:44 +0000, aaron morton said:</p>
<p class="p2"><br></p>
<p class="p3">AFAIk if you want to keep it inside cassandra then�DSE, roll your own from scratch or start with�<a href="https://github.com/tjake/Solandra"><span class="s1">https://github.com/tjake/Solandra</span></a>�.�</p>
<p class="p4"><br></p>
<p class="p3">Outside of Cassandra I've heard of people using Elastic Search or Solr which I *think* is now faster at updating the index.�</p>
<p class="p4"><br></p>
<p class="p3">Hope that helps.�</p>
<p class="p4"><br></p>
<p class="p3">�</p>
<p class="p3">-----------------</p>
<p class="p3">Aaron Morton</p>
<p class="p3">Freelance Developer</p>
<p class="p3">@aaronmorton</p>
<p class="p3"><span class="s1"><a href="http://www.thelastpickle.com/">http://www.thelastpickle.com</a></span></p>
<p class="p4"><br></p>
<p class="p3">On 4/09/2012, at 3:00 AM, Andrey V. Panov &lt;<a href="mailto:panov.andy@gmail.com"><span class="s1">panov.andy@gmail.com</span></a>&gt; wrote:</p>
<p class="p3">Some one did search on Lucene, but for very fresh data they build search index in memory so data become available for search without delays.</p>
<p class="p4"><br></p>
<p class="p3">On 3 September 2012 22:25, Oleg Dulin <span class="s2">&lt;<a href="mailto:oleg.dulin@gmail.com"><span class="s3">oleg.dulin@gmail.com</span></a>&gt;</span> wrote:</p>
<p class="p3">Dear Distinguished Colleagues:</p>
<p class="p5"><br></p>
<p class="p6"><br></p>
<p class="p7">--<span class="Apple-converted-space">�</span></p>
<p class="p7">Regards,</p>
<p class="p7">Oleg Dulin</p>
<p class="p7">NYC Java Big Data Engineer</p>
<p class="p7">http://www.olegdulin.com/</p>
</body>
</html>
----------------14526896961120048547--