Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 99564 invoked from network); 13 Apr 2006 22:14:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Apr 2006 22:14:50 -0000 Received: (qmail 80276 invoked by uid 500); 13 Apr 2006 22:14:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80245 invoked by uid 500); 13 Apr 2006 22:14:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80233 invoked by uid 99); 13 Apr 2006 22:14:44 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 15:14:44 -0700 X-ASF-Spam-Status: No, hits=2.3 required=10.0 tests=HTML_10_20,HTML_MESSAGE,RCVD_IN_BL_SPAMCOP_NET,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of erickerickson@gmail.com designates 64.233.166.178 as permitted sender) Received: from [64.233.166.178] (HELO pproxy.gmail.com) (64.233.166.178) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 15:14:42 -0700 Received: by pproxy.gmail.com with SMTP id m51so2461252pye for ; Thu, 13 Apr 2006 15:14:22 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=i4RiHYWAfJmgOi9wmSa0/vvKUL3fJ4wMevrOWhXkJvuqZZhn1hHonayD76J2SVEJ3Q/FH+1HhxGVjjhk5LuJm6xXDWlyapHbvxP+qpaWi44LIN/na7R3wduyUPA/uawUBteOK+vYTc2+bSqvnsuwy7Ehl8Fjm7ZQX94LCgtnopM= Received: by 10.35.51.6 with SMTP id d6mr1617720pyk; Thu, 13 Apr 2006 15:14:22 -0700 (PDT) Received: by 10.35.17.4 with HTTP; Thu, 13 Apr 2006 15:14:22 -0700 (PDT) Message-ID: <359a92830604131514r626fcf54ue51e226b35c80ed8@mail.gmail.com> Date: Thu, 13 Apr 2006 18:14:22 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Lucene Seaches VS. Relational database Queries In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_699_11108220.1144966462267" References: <200604131633.k3DGXkQl001419@pigeon.ciao.com> <6e3ae6310604130951y4f107de5u1f8a72ea11d3b547@mail.gmail.com> <359a92830604131030j366a8fd0n1e260170b2e55e27@mail.gmail.com> <359a92830604131206s3c2fac4cp97a84a613503b68b@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_699_11108220.1144966462267 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On 4/13/06, Ananth T. Sarathy wrote: > > No we do have drop downs selects that would allow for the substitution, > but > we also have a free text fields to allow the user to search. That solutio= n > would I think work for the DB query replacement, but you would need a > regular non underscored field to allow for free text. > > Well, as I say, you've solved that problem already. Somewhere, somehow, you have to decide what to do with the "free text" data. Somewhere, somehow, you've got to decide whether "stunt director trainee" means "stunt director= " + trainee, stunt + "director trainee", or stunt + director + trainee. Or else you can't form your SQL in the first place. And the query doesn't produce reasonable results if you *do* form the query. If you can form your SQL with distinct "Title =3D 'blah'" clauses, you can substitute underscores for spaces in the terms. If you can do that, you can ask Lucene to find the terms you indexed with underscores. And if you can't form your SQL queries in the first place, the question is irrelevant. All that said, perhaps a better question is "why is your SQL slow?". Relational databases are really good at this sort of thing. Many smart people have put many, many developer years into making relational databases deal with joins efficiently. Assuming you have the proper indexes etc. As much as I've been impressed with Lucene, I have to ask whether it's relevant to your problem. I have no clue what database you're using, how it's set up, or whether the examples you've given are simplified enough tha= t I don't understand what the *real* problem is. But if your issue isn't really dealing with a full text search, your relational DB should be able t= o handle it, given the proper wherewithal. Have you done "explain plan" or it= s equivalent in your DB? Have you tried adding indexes to avoid full table scans? In short, have you fully convinced yourself that your RDB can't handle the problem? I'm *extremely* leery of introducing another "moving part" into a product without fully exhausting the current parts. It's *never* a good thing to ad= d a new step into the process unless you can convince yourself that it solves more problems than it introduces. You've already alluded to keeping the DB and the Lucene indexes in synch. I *guarantee* that there will be other issues that rise up and bite you. *Count* on whatever you think you'll spen= d in introducing Lucene into your mix (say effort X) costing you *at least* 2= X more time/energy than you think. I'd actually give it a multiplier closer t= o 4X. This is NOT a slam on lucene. But developers often miss the bigger picture. What processes are you going to put in place to keep the Lucene part of the product up to date? How much is it going to cost your company to troubleshoot the Lucene portion? How many company resources are going to be spent answering customer complaints? What is the ongoing maintenance requirement? I like Lucene. I've just persuaded my company to use it in our next product= . I've been incredibly impressed with it's architeture and implementation. Bu= t it's a text search engine, and shouldn't be confused with a RDB. *Assuming* that the RDB is an integral part of your product, I'd spend a lot of time making that do what I needed before I'd introduce another moving part. All for what it's worth, from an old "C" programmer .. Best Erick ------=_Part_699_11108220.1144966462267--