lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ananth T. Sarathy" <ananth.t.sara...@gmail.com>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 16:19:47 GMT
Yes, this is pretty much what I was trying to do. I am not sure how you
solve this with out he maintenance and the removal of duplicate hits logic
taking up so much time that any performance benefit from querying lucene is
lost.

On 4/13/06, Satuluri, Venu_Madhav <Venu.Madhav.Satuluri@deshaw.com> wrote:
>
> I think you are asking if we can retain 1:n relationships in lucene.
>
> Ok, I'll go out on a limb and give my solution. Say you have a table A
> and table B with B having multiple rows associated to each row in A.
> Also your documents are centered around A, i.e. all your queries return
> some row(s) of A, not B, but you should be able to query on fields in B.
>
>
> In such a case, you need to have multiple documents for each row in A.
> To be more specific, if a row in A has 5 corresponding rows in B, then
> there must be 5 Documents in lucene index corresponding to A. In other
> words, each lucene Document corresponds to a row in 'A join B'.
>
> I am not sure of this scheme. If there are more tables, then this
> quickly explodes the no. of documents. We'll have as many documents as
> will be there in {A join B join C join D.. }. Plus, we'll need to remove
> Documents which correspond logically to the same row in A from the Hits.
>
> Is there a better way to do this? Or I don't make sense?
>
>
> -----Original Message-----
> From: Ananth T. Sarathy [mailto:ananth.t.sarathy@gmail.com]
> Sent: Thursday, April 13, 2006 9:04 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Seaches VS. Relational database Queries
>
>
> Ok,
> Some of the stuff makes  some sense. I was a little loopy from lack of
> sleep and some of these solutions don't really cover my concerns....
>
>
> Let's take this movie example. If each member of a production Crew can
> have
> multiple titles that come from a lookup table of Distinct Jobs
>
> Titles
> Assistant Producer
> Producer
> Executive Producer
> Director
> Director Trainee
> Stunt Director
>
> In the Database there would be a Assocation Table Linking each Crew
> member
> the titles they had
>
> Crew_Titles
> Crew_ID   Title
> 1             Producer
> 1
>
> On 4/12/06, Nadav Har'El <NYH@il.ibm.com> wrote:
> >
> > Chris Hostetter <hossman_lucene@fucit.org> wrote on 12/04/2006
> 01:41:37
> > AM:
> > > : them in one field).  One of the problems I see would be with
> values
> > that
> > > : over lap (Example, name where one name is Jason Bateman, and one
> is
> > Jason
> > > : Bateman Black, and it would be hard to replicate the Discrete
> Search
> > for
> > >
> > > they way field values are "analyzed" is extremely configurable --
> down
> > to
> > > the individual field level.  Which means that while you can have an
> > actor
> > > field where you can do loose text searching for "bateman" and get
> back
> > > movies staring "Jason Bateman" and "Jason Bateman Black" (and even
> Guido
> > > Batemans" if you use stemming) you can also have another field using
> a
> > > KeywordAnalyzer such that a record with teh values "Jason Bateman"
> and
> > > "Jack Black" will only be matched if hte user searches for "Jason
> > Bateman"
> > > or "Jack Black" ... searching for "Bateman Jack" or "Black Jason"
> will
> > not
> > > work.
> >
> > Another possible trick is to have one field, but mark its end with
> special
> > tokens, say "^" and "$", so that "Jason Bateman" gets indexed as four
> > tokens:
> >      ^ Jason Bateman $
> > Then, if you want to search for the name Jason Bateman and that name
> only,
> > just search for the phrase "^ Jason Bateman $" - and only this entry
> will
> > match. (you can also continue to search this field normally)
> >
> > If you'll think about this, you'll notice that you don't actually need
> > the beginning-of-field marker ("^") because it's easy to recognize the
> > beginning of a field because the position there is 0. Unfortunately,
> > I don't know how to match position 0 using the standard QueryParser,
> > but you can do it with the SpanFirstQuery: for example if we index
> > Jason Bateman as the three tokens
> >      Jason Bateman $
> > then we can search for it using something like
> >      SpanQuery[] terms = {
> >            new SpanTermQuery(new Term("actor", "Jason")),
> >            new SpanTermQuery(new Term("actor", "Bateman")),
> >            new SpanTermQuery(new Term("actor", "$")) };
> >      new SpanFirstQuery(new SpanNearQuery(terms, 0, true), 3);
> > (or something like that... I didn't test this)
> >
> >
> > --
> > Nadav Har'El
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Ananth T Sarathy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


--
Ananth T Sarathy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message