lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Satuluri, Venu_Madhav" <>
Subject RE: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 16:14:58 GMT
I think you are asking if we can retain 1:n relationships in lucene.

Ok, I'll go out on a limb and give my solution. Say you have a table A
and table B with B having multiple rows associated to each row in A.
Also your documents are centered around A, i.e. all your queries return
some row(s) of A, not B, but you should be able to query on fields in B.

In such a case, you need to have multiple documents for each row in A.
To be more specific, if a row in A has 5 corresponding rows in B, then
there must be 5 Documents in lucene index corresponding to A. In other
words, each lucene Document corresponds to a row in 'A join B'.

I am not sure of this scheme. If there are more tables, then this
quickly explodes the no. of documents. We'll have as many documents as
will be there in {A join B join C join D.. }. Plus, we'll need to remove
Documents which correspond logically to the same row in A from the Hits.

Is there a better way to do this? Or I don't make sense?

-----Original Message-----
From: Ananth T. Sarathy [] 
Sent: Thursday, April 13, 2006 9:04 PM
Subject: Re: Lucene Seaches VS. Relational database Queries

 Some of the stuff makes  some sense. I was a little loopy from lack of
sleep and some of these solutions don't really cover my concerns....

Let's take this movie example. If each member of a production Crew can
multiple titles that come from a lookup table of Distinct Jobs

Assistant Producer
Executive Producer
Director Trainee
Stunt Director

In the Database there would be a Assocation Table Linking each Crew
the titles they had

Crew_ID   Title
1             Producer

On 4/12/06, Nadav Har'El <> wrote:
> Chris Hostetter <> wrote on 12/04/2006
> AM:
> > : them in one field).  One of the problems I see would be with
> that
> > : over lap (Example, name where one name is Jason Bateman, and one
> Jason
> > : Bateman Black, and it would be hard to replicate the Discrete
> for
> >
> > they way field values are "analyzed" is extremely configurable --
> to
> > the individual field level.  Which means that while you can have an
> actor
> > field where you can do loose text searching for "bateman" and get
> > movies staring "Jason Bateman" and "Jason Bateman Black" (and even
> > Batemans" if you use stemming) you can also have another field using
> > KeywordAnalyzer such that a record with teh values "Jason Bateman"
> > "Jack Black" will only be matched if hte user searches for "Jason
> Bateman"
> > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason"
> not
> > work.
> Another possible trick is to have one field, but mark its end with
> tokens, say "^" and "$", so that "Jason Bateman" gets indexed as four
> tokens:
>      ^ Jason Bateman $
> Then, if you want to search for the name Jason Bateman and that name
> just search for the phrase "^ Jason Bateman $" - and only this entry
> match. (you can also continue to search this field normally)
> If you'll think about this, you'll notice that you don't actually need
> the beginning-of-field marker ("^") because it's easy to recognize the
> beginning of a field because the position there is 0. Unfortunately,
> I don't know how to match position 0 using the standard QueryParser,
> but you can do it with the SpanFirstQuery: for example if we index
> Jason Bateman as the three tokens
>      Jason Bateman $
> then we can search for it using something like
>      SpanQuery[] terms = {
>            new SpanTermQuery(new Term("actor", "Jason")),
>            new SpanTermQuery(new Term("actor", "Bateman")),
>            new SpanTermQuery(new Term("actor", "$")) };
>      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
> (or something like that... I didn't test this)
> --
> Nadav Har'El
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Ananth T Sarathy

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message