lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramana Jelda" <ramana.je...@ciao-group.com>
Subject RE: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 16:33:41 GMT
No.. I don't see your solution is performant..
If each lucene Document corresponds to a row in 'A join B' then Index
explodes..
Index size drastically increases. 

Why not then creating two indexs A and B.
And search for A and then from obtained A documents information search in B.

It seems for me more performant than indexing all 'A join B' documents.

Any commenters?

Jelda

> -----Original Message-----
> From: Satuluri, Venu_Madhav [mailto:Venu.Madhav.Satuluri@deshaw.com] 
> Sent: Thursday, April 13, 2006 6:15 PM
> To: java-user@lucene.apache.org
> Subject: RE: Lucene Seaches VS. Relational database Queries
> 
> I think you are asking if we can retain 1:n relationships in lucene.
> 
> Ok, I'll go out on a limb and give my solution. Say you have 
> a table A and table B with B having multiple rows associated 
> to each row in A.
> Also your documents are centered around A, i.e. all your 
> queries return some row(s) of A, not B, but you should be 
> able to query on fields in B.
> 
> 
> In such a case, you need to have multiple documents for each row in A.
> To be more specific, if a row in A has 5 corresponding rows 
> in B, then there must be 5 Documents in lucene index 
> corresponding to A. In other words, each lucene Document 
> corresponds to a row in 'A join B'.
> 
> I am not sure of this scheme. If there are more tables, then 
> this quickly explodes the no. of documents. We'll have as 
> many documents as will be there in {A join B join C join D.. 
> }. Plus, we'll need to remove Documents which correspond 
> logically to the same row in A from the Hits.
> 
> Is there a better way to do this? Or I don't make sense?
> 
> 
> -----Original Message-----
> From: Ananth T. Sarathy [mailto:ananth.t.sarathy@gmail.com]
> Sent: Thursday, April 13, 2006 9:04 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Seaches VS. Relational database Queries
> 
> 
> Ok,
>  Some of the stuff makes  some sense. I was a little loopy 
> from lack of
> sleep and some of these solutions don't really cover my concerns....
> 
> 
> Let's take this movie example. If each member of a production Crew can
> have
> multiple titles that come from a lookup table of Distinct Jobs
> 
> Titles
> Assistant Producer
> Producer
> Executive Producer
> Director
> Director Trainee
> Stunt Director
> 
> In the Database there would be a Assocation Table Linking each Crew
> member
> the titles they had
> 
> Crew_Titles
> Crew_ID   Title
> 1             Producer
> 1
> 
> On 4/12/06, Nadav Har'El <NYH@il.ibm.com> wrote:
> >
> > Chris Hostetter <hossman_lucene@fucit.org> wrote on 12/04/2006
> 01:41:37
> > AM:
> > > : them in one field).  One of the problems I see would be with
> values
> > that
> > > : over lap (Example, name where one name is Jason Bateman, and one
> is
> > Jason
> > > : Bateman Black, and it would be hard to replicate the Discrete
> Search
> > for
> > >
> > > they way field values are "analyzed" is extremely configurable --
> down
> > to
> > > the individual field level.  Which means that while you 
> can have an
> > actor
> > > field where you can do loose text searching for "bateman" and get
> back
> > > movies staring "Jason Bateman" and "Jason Bateman Black" (and even
> Guido
> > > Batemans" if you use stemming) you can also have another 
> field using
> a
> > > KeywordAnalyzer such that a record with teh values "Jason Bateman"
> and
> > > "Jack Black" will only be matched if hte user searches for "Jason
> > Bateman"
> > > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason"
> will
> > not
> > > work.
> >
> > Another possible trick is to have one field, but mark its end with
> special
> > tokens, say "^" and "$", so that "Jason Bateman" gets 
> indexed as four
> > tokens:
> >      ^ Jason Bateman $
> > Then, if you want to search for the name Jason Bateman and that name
> only,
> > just search for the phrase "^ Jason Bateman $" - and only this entry
> will
> > match. (you can also continue to search this field normally)
> >
> > If you'll think about this, you'll notice that you don't 
> actually need
> > the beginning-of-field marker ("^") because it's easy to 
> recognize the
> > beginning of a field because the position there is 0. Unfortunately,
> > I don't know how to match position 0 using the standard QueryParser,
> > but you can do it with the SpanFirstQuery: for example if we index
> > Jason Bateman as the three tokens
> >      Jason Bateman $
> > then we can search for it using something like
> >      SpanQuery[] terms = {
> >            new SpanTermQuery(new Term("actor", "Jason")),
> >            new SpanTermQuery(new Term("actor", "Bateman")),
> >            new SpanTermQuery(new Term("actor", "$")) };
> >      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
> > (or something like that... I didn't test this)
> >
> >
> > --
> > Nadav Har'El
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> Ananth T Sarathy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message