Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98958 invoked from network); 13 Apr 2006 15:34:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Apr 2006 15:34:32 -0000 Received: (qmail 74635 invoked by uid 500); 13 Apr 2006 15:34:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 74602 invoked by uid 500); 13 Apr 2006 15:34:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 74587 invoked by uid 99); 13 Apr 2006 15:34:05 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 08:34:05 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of ananth.t.sarathy@gmail.com designates 64.233.184.226 as permitted sender) Received: from [64.233.184.226] (HELO wproxy.gmail.com) (64.233.184.226) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 08:34:04 -0700 Received: by wproxy.gmail.com with SMTP id 55so221468wri for ; Thu, 13 Apr 2006 08:33:44 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=VVL0f9eVvLpoJU/NjRSZwHWLSPhuwI3dwBT+tX8eoSg/Ijua0vE0Suy3njtYcqgYfK4IgdKoJAAN5F2ye3hLB+ObDDNLoOee5Cl2gXAWJBjI6g6Mkq2y93nIa7VYd5lD2vxJOTrIlvqcKmSyFtY2GbS1dCpenZB8gHA/icfNwKk= Received: by 10.64.178.16 with SMTP id a16mr416079qbf; Thu, 13 Apr 2006 08:33:44 -0700 (PDT) Received: by 10.65.188.3 with HTTP; Thu, 13 Apr 2006 08:33:44 -0700 (PDT) Message-ID: Date: Thu, 13 Apr 2006 11:33:44 -0400 From: "Ananth T. Sarathy" To: java-user@lucene.apache.org Subject: Re: Lucene Seaches VS. Relational database Queries In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_20721_31536729.1144942424020" References: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_20721_31536729.1144942424020 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Ok, Some of the stuff makes some sense. I was a little loopy from lack of sleep and some of these solutions don't really cover my concerns.... Let's take this movie example. If each member of a production Crew can have multiple titles that come from a lookup table of Distinct Jobs Titles Assistant Producer Producer Executive Producer Director Director Trainee Stunt Director In the Database there would be a Assocation Table Linking each Crew member the titles they had Crew_Titles Crew_ID Title 1 Producer 1 On 4/12/06, Nadav Har'El wrote: > > Chris Hostetter wrote on 12/04/2006 01:41:37 > AM: > > : them in one field). One of the problems I see would be with values > that > > : over lap (Example, name where one name is Jason Bateman, and one is > Jason > > : Bateman Black, and it would be hard to replicate the Discrete Search > for > > > > they way field values are "analyzed" is extremely configurable -- down > to > > the individual field level. Which means that while you can have an > actor > > field where you can do loose text searching for "bateman" and get back > > movies staring "Jason Bateman" and "Jason Bateman Black" (and even Guid= o > > Batemans" if you use stemming) you can also have another field using a > > KeywordAnalyzer such that a record with teh values "Jason Bateman" and > > "Jack Black" will only be matched if hte user searches for "Jason > Bateman" > > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason" will > not > > work. > > Another possible trick is to have one field, but mark its end with specia= l > tokens, say "^" and "$", so that "Jason Bateman" gets indexed as four > tokens: > ^ Jason Bateman $ > Then, if you want to search for the name Jason Bateman and that name only= , > just search for the phrase "^ Jason Bateman $" - and only this entry will > match. (you can also continue to search this field normally) > > If you'll think about this, you'll notice that you don't actually need > the beginning-of-field marker ("^") because it's easy to recognize the > beginning of a field because the position there is 0. Unfortunately, > I don't know how to match position 0 using the standard QueryParser, > but you can do it with the SpanFirstQuery: for example if we index > Jason Bateman as the three tokens > Jason Bateman $ > then we can search for it using something like > SpanQuery[] terms =3D { > new SpanTermQuery(new Term("actor", "Jason")), > new SpanTermQuery(new Term("actor", "Bateman")), > new SpanTermQuery(new Term("actor", "$")) }; > new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3); > (or something like that... I didn't test this) > > > -- > Nadav Har'El > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Ananth T Sarathy ------=_Part_20721_31536729.1144942424020--