Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 8287 invoked from network); 13 Apr 2006 17:22:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Apr 2006 17:22:52 -0000 Received: (qmail 5479 invoked by uid 500); 13 Apr 2006 17:22:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 5367 invoked by uid 500); 13 Apr 2006 17:22:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 5248 invoked by uid 99); 13 Apr 2006 17:22:42 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 10:22:42 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [149.77.160.1] (HELO master.hyd.deshaw.com) (149.77.160.1) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Apr 2006 10:22:41 -0700 Received: from mshydpub1.hyd.deshaw.com (mshydpub1.hyd.deshaw.com [149.77.160.67]) by master.hyd.deshaw.com (8.12.7/8.12.7/2.0.kim.desco.357) with ESMTP id k3DHMFR8003989 for ; Thu, 13 Apr 2006 22:52:17 +0530 (IST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.0.6603.0 Subject: RE: Lucene Seaches VS. Relational database Queries Date: Thu, 13 Apr 2006 22:52:15 +0530 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Lucene Seaches VS. Relational database Queries Thread-Index: AcZfGq0tvfIONOpWTwa/qVB04WBuBgAAtSrw From: "Satuluri, Venu_Madhav" To: X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I am not sure having an index each for each table solves the problem. (Going by the schema I put in the earlier mail) You have an index each for tables A, B and C. What is the lucene-equivalent of the db query=20 A.field1 =3D=3D value1 and B.field2 =3D=3D value2 and C.field3 =3D=3D = value3. You cant use MultiSearcher as MultiSearcher executes the same query across all indexes. Its not going to return any results. The remaining alternative is to split the query ourselves into 3 parts, one for each index and then do the AND-operation in our program. Needless to say, this can get terribly inefficient as query size and nesting increases. -----Original Message----- From: Chris Lu [mailto:chris.lu@gmail.com]=20 Sent: Thursday, April 13, 2006 10:21 PM To: java-user@lucene.apache.org Subject: Re: Lucene Seaches VS. Relational database Queries I agree with Jelda. Lucene is more document-centric. Storing the relationship is not a good idea. It's better to simply have 2 indexes. Usually when users search, they can choose which index they want. Of course, building the indexes will take more time to process-data. Lucene can not replace relational DB altogether. One reason is Lucene is more like object-oriented. Chris Lu --------------------------------------- Full-Text Lucene Search on Any Databases http://www.dbsight.net Faster to Setup than reading marketing materials! On 4/13/06, Ramana Jelda wrote: > No.. I don't see your solution is performant.. > If each lucene Document corresponds to a row in 'A join B' then Index > explodes.. > Index size drastically increases. > > Why not then creating two indexs A and B. > And search for A and then from obtained A documents information search in B. > > It seems for me more performant than indexing all 'A join B' documents. > > Any commenters? > > Jelda > > > -----Original Message----- > > From: Satuluri, Venu_Madhav [mailto:Venu.Madhav.Satuluri@deshaw.com] > > Sent: Thursday, April 13, 2006 6:15 PM > > To: java-user@lucene.apache.org > > Subject: RE: Lucene Seaches VS. Relational database Queries > > > > I think you are asking if we can retain 1:n relationships in lucene. > > > > Ok, I'll go out on a limb and give my solution. Say you have > > a table A and table B with B having multiple rows associated > > to each row in A. > > Also your documents are centered around A, i.e. all your > > queries return some row(s) of A, not B, but you should be > > able to query on fields in B. > > > > > > In such a case, you need to have multiple documents for each row in A. > > To be more specific, if a row in A has 5 corresponding rows > > in B, then there must be 5 Documents in lucene index > > corresponding to A. In other words, each lucene Document > > corresponds to a row in 'A join B'. > > > > I am not sure of this scheme. If there are more tables, then > > this quickly explodes the no. of documents. We'll have as > > many documents as will be there in {A join B join C join D.. > > }. Plus, we'll need to remove Documents which correspond > > logically to the same row in A from the Hits. > > > > Is there a better way to do this? Or I don't make sense? > > > > > > -----Original Message----- > > From: Ananth T. Sarathy [mailto:ananth.t.sarathy@gmail.com] > > Sent: Thursday, April 13, 2006 9:04 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene Seaches VS. Relational database Queries > > > > > > Ok, > > Some of the stuff makes some sense. I was a little loopy > > from lack of > > sleep and some of these solutions don't really cover my concerns.... > > > > > > Let's take this movie example. If each member of a production Crew can > > have > > multiple titles that come from a lookup table of Distinct Jobs > > > > Titles > > Assistant Producer > > Producer > > Executive Producer > > Director > > Director Trainee > > Stunt Director > > > > In the Database there would be a Assocation Table Linking each Crew > > member > > the titles they had > > > > Crew_Titles > > Crew_ID Title > > 1 Producer > > 1 > > > > On 4/12/06, Nadav Har'El wrote: > > > > > > Chris Hostetter wrote on 12/04/2006 > > 01:41:37 > > > AM: > > > > : them in one field). One of the problems I see would be with > > values > > > that > > > > : over lap (Example, name where one name is Jason Bateman, and one > > is > > > Jason > > > > : Bateman Black, and it would be hard to replicate the Discrete > > Search > > > for > > > > > > > > they way field values are "analyzed" is extremely configurable -- > > down > > > to > > > > the individual field level. Which means that while you > > can have an > > > actor > > > > field where you can do loose text searching for "bateman" and get > > back > > > > movies staring "Jason Bateman" and "Jason Bateman Black" (and even > > Guido > > > > Batemans" if you use stemming) you can also have another > > field using > > a > > > > KeywordAnalyzer such that a record with teh values "Jason Bateman" > > and > > > > "Jack Black" will only be matched if hte user searches for "Jason > > > Bateman" > > > > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason" > > will > > > not > > > > work. > > > > > > Another possible trick is to have one field, but mark its end with > > special > > > tokens, say "^" and "$", so that "Jason Bateman" gets > > indexed as four > > > tokens: > > > ^ Jason Bateman $ > > > Then, if you want to search for the name Jason Bateman and that name > > only, > > > just search for the phrase "^ Jason Bateman $" - and only this entry > > will > > > match. (you can also continue to search this field normally) > > > > > > If you'll think about this, you'll notice that you don't > > actually need > > > the beginning-of-field marker ("^") because it's easy to > > recognize the > > > beginning of a field because the position there is 0. Unfortunately, > > > I don't know how to match position 0 using the standard QueryParser, > > > but you can do it with the SpanFirstQuery: for example if we index > > > Jason Bateman as the three tokens > > > Jason Bateman $ > > > then we can search for it using something like > > > SpanQuery[] terms =3D { > > > new SpanTermQuery(new Term("actor", "Jason")), > > > new SpanTermQuery(new Term("actor", "Bateman")), > > > new SpanTermQuery(new Term("actor", "$")) }; > > > new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3); > > > (or something like that... I didn't test this) > > > > > > > > > -- > > > Nadav Har'El > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > -- > > Ananth T Sarathy > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org